UCSC Genomics Institute Computing Infrastructure Information
wikiGIdb
http://giwiki.gi.ucsc.edu/index.php?title=Genomics_Institute_Computing_Information
MediaWiki 1.40.0
first-letter
Media
Special
Talk
User
User talk
UCSC Genomics Institute Computing Infrastructure Information
UCSC Genomics Institute Computing Infrastructure Information talk
File
File talk
MediaWiki
MediaWiki talk
Template
Template talk
Help
Help talk
Category
Category talk
Main Page
0
1
1
2018-04-23T19:40:47Z
MediaWiki default
0
wikitext
text/x-wiki
<strong>MediaWiki has been successfully installed.</strong>
Consult the [//meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
== Getting started ==
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
* [//www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]
8e0aa2f2a7829587801db67d0424d9b447e09867
11
1
2018-04-27T23:11:54Z
Haifang
1
wikitext
text/x-wiki
[[Genomic_Institute_Computing_Information_wiki]]
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
* [//www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]
154946a8336c76296a08d2cf5d0f30226615fbdf
12
11
2018-04-27T23:12:18Z
Haifang
1
wikitext
text/x-wiki
[[Genomic_Institute_Computing_Information_wiki]]
432813a447cc161af82ebd790e1a76692120a191
15
12
2018-04-30T18:29:13Z
Haifang
1
wikitext
text/x-wiki
[[Genomic Institute Computing Information wiki]]
276d76d7dcc04b644bab16efa57556966a19492a
50
15
2018-07-02T18:53:22Z
Weiler
3
wikitext
text/x-wiki
Welcome to the UC Santa Cruz Genomics Information Wiki! Below and dashboards for various Information Repositories related to the Genomics Institute.
[[Genomics Institute Computing Information]]
dfd44c5753dc4cd68c97eaa85f7e98a253dc5724
51
50
2018-07-02T18:53:37Z
Weiler
3
wikitext
text/x-wiki
Welcome to the UC Santa Cruz Genomics Information Wiki! Below are dashboards for various Information Repositories related to the Genomics Institute.
[[Genomics Institute Computing Information]]
75dcd806d48821b5f8988fa559719f39ef65bbb5
GenomicInstitute
0
3
3
2018-04-23T21:27:31Z
Haifang
1
Created page with "Genomic Institute General Information Repository ==Computing Resources and Support== *[[Obtain VPN access]] *[[....]]"
wikitext
text/x-wiki
Genomic Institute General Information Repository
==Computing Resources and Support==
*[[Obtain VPN access]]
*[[....]]
ac2b6faff595f401c52626bb825b97bf4d094796
Obtain VPN access
0
4
4
2018-04-23T21:29:52Z
Haifang
1
Created page with "If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, yo..."
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, your PI's name, PI's approval for this access (an email from your PI will be fine) and what other access do you need such as an unix server account or to the OpenStack.
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select openvpn-install-2.4.5-I601.exe
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
d185e06444de930e3bd4901081dd3e560a7ba23d
5
4
2018-04-23T21:36:23Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide the flollowing:
your name
your PI's name
your PI's approval for this access (an email from your PI will be fine)
what other access do you need such as an unix server account or to the OpenStack
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instructions on setting up '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select openvpn-install-2.4.5-I601.exe
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
b535703527751cb7c327fef26f84fcca24671e93
MediaWiki:Sidebar
8
5
6
2018-04-27T21:41:11Z
Haifang
1
Created page with " * navigation ** mainpage|mainpage-description ** recentchanges-url|recentchanges ** randompage-url|randompage ** helppage|help ** Genomic Institute * SEARCH * TOOLBOX * LANGU..."
wikitext
text/x-wiki
* navigation
** mainpage|mainpage-description
** recentchanges-url|recentchanges
** randompage-url|randompage
** helppage|help
** Genomic Institute
* SEARCH
* TOOLBOX
* LANGUAGES
635f8e7f1f46c202d9ee3e2e3e6161a0e1b64f16
7
6
2018-04-27T21:42:43Z
Haifang
1
wikitext
text/x-wiki
[[Genomic Institute Knowledge wiki]]
cfffa6f072bdd898be7cee5fca0e0a41eb5e2b30
8
7
2018-04-27T21:50:32Z
Haifang
1
wikitext
text/x-wiki
[[Genomic Institute Computing Information wiki]]
27e16e4f74f17c77f896880ec3778a0bc74946ce
10
8
2018-04-27T23:05:39Z
Haifang
1
wikitext
text/x-wiki
* navigation
** mainpage|mainpage
** portal-url|portal
** currentevents-url|currentevents
** recentchanges-url|recentchanges
** randompage-url|randompage
** helppage|help
** sitesupport-url|sitesupport
** gi|Genomic Institute
fca722762bb92c069fe71ce61b54c57060984b83
14
10
2018-04-30T18:27:02Z
Haifang
1
wikitext
text/x-wiki
* navigation
** mainpage|mainpage
** portal-url|portal
** currentevents-url|currentevents
** recentchanges-url|recentchanges
** randompage-url|randompage
** helppage|help
** sitesupport-url|sitesupport
** Genomic Institute|Genomic Institute
3f8ed063ff692e97435bb6996c6d35067e571b35
Genomics Institute Computing Information
0
6
9
2018-04-27T22:19:30Z
Haifang
1
Created page with "Genomic Institute Computing Information Repository ==Datacenter Migration== *[[data migration using AWS S3/Glacier tutorial]] *[[data storage resources]] *[[... ...]] ==VPN..."
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[data migration using AWS S3/Glacier tutorial]]
*[[data storage resources]]
*[[... ...]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[... ...]]
001416944db7b57d95462824587bdba97deadc34
22
9
2018-05-24T17:26:59Z
Jgarcia
2
/* Datacenter Migration */
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[data migration using AWS S3/Glacier tutorial]]
*[[data storage resources]]
*[[... ...]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[... ...]]
6d6e0542554908f16614a1ce475bd1b41cb1aba3
23
22
2018-05-25T21:54:56Z
Haifang
1
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
*[[... ...]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[... ...]]
c763656b224ee09e24c3e49d8fecfa5c0fd9f819
28
23
2018-06-15T18:55:26Z
Haifang
1
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
*[[... ...]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[... ...]]
bc743611ecf940dbe2237992ba32b7ff80a55917
45
28
2018-07-02T18:47:06Z
Weiler
3
Weiler moved page [[Genomic Institute Computing Information wiki]] to [[Genomics Institute Computing Information]]
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
*[[... ...]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[... ...]]
bc743611ecf940dbe2237992ba32b7ff80a55917
Gi
0
7
13
2018-04-27T23:13:21Z
Haifang
1
Created page with "[[Genomic_Institute_Computing_Information_wiki]]"
wikitext
text/x-wiki
[[Genomic_Institute_Computing_Information_wiki]]
432813a447cc161af82ebd790e1a76692120a191
UC Santa Cruz Genomics Institute
0
8
16
2018-04-30T18:30:01Z
Haifang
1
Created page with "[[Genomic Institute Computing Information wiki]]"
wikitext
text/x-wiki
[[Genomic Institute Computing Information wiki]]
276d76d7dcc04b644bab16efa57556966a19492a
47
16
2018-07-02T18:48:39Z
Weiler
3
wikitext
text/x-wiki
Welcome to the UC Santa Cruz Genomics Information Wiki! Below and dashboards for various Information Repositories related to the Genomics Institute.
[[Genomics Institute Computing Information]]
dfd44c5753dc4cd68c97eaa85f7e98a253dc5724
48
47
2018-07-02T18:49:17Z
Weiler
3
Weiler moved page [[Genomic Institute]] to [[UC Santa Cruz Genomics Institute]]
wikitext
text/x-wiki
Welcome to the UC Santa Cruz Genomics Information Wiki! Below and dashboards for various Information Repositories related to the Genomics Institute.
[[Genomics Institute Computing Information]]
dfd44c5753dc4cd68c97eaa85f7e98a253dc5724
Requirement for users to get GI VPN access
0
9
17
2018-04-30T18:37:01Z
Haifang
1
Created page with "==OpenVPN Client requirement from users== If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''...."
wikitext
text/x-wiki
==OpenVPN Client requirement from users==
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, your PI's name, PI's approval for this access (an email from your PI will be fine) and what other access do you need such as an unix server account or to the OpenStack.
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select openvpn-install-2.4.5-I601.exe
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
ca815c4bcc1e7c267ff78cdeb841174d580c7aa0
18
17
2018-04-30T18:37:42Z
Haifang
1
/* OpenVPN Client requirement from users */
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, your PI's name, PI's approval for this access (an email from your PI will be fine) and what other access do you need such as an unix server account or to the OpenStack.
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select openvpn-install-2.4.5-I601.exe
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
f1af2ba66e072903d991da845521d3874acca625
19
18
2018-04-30T18:39:51Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, your PI's name, PI's approval for this access (an email from your PI will be fine) and what other access do you need such as an unix server account or to the OpenStack.
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instruction on how to setup '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select openvpn-install-2.4.5-I601.exe
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
93bc09a050c78c6781b5f05221bc897b0f08584b
20
19
2018-04-30T18:51:56Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name, your PI's name, PI's approval for this access (an email from your PI will be fine) and what other access do you need such as an unix server account or to the OpenStack.
Before your appointment please make sure you have the following:
A laptop running OS X, Windows or Ubuntu and connected to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instruction on how to setup '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
5fd1c6d7d36836a69c5a8f1100897010d7e84b11
21
20
2018-05-02T22:50:09Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to POD or CIRM, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide
your name
your PI's name
PI's approval for this access (an email from your PI will be fine)
what other access do you need such as an unix server account or to the OpenStack.
Before your appointment, please make sure you have the following:
A laptop running OS X, Windows or Ubuntu
wireless connection to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instruction on how to setup '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html.
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
12f8e19fcff07726c154a0b5e12f481c1e1fb3d3
35
21
2018-07-02T18:06:36Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment:
(link here)
4: Read and sign the last page of the NIH Data Use Agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
076417749c02a55001d53944283885444408299b
37
35
2018-07-02T18:28:29Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment:
[[:File:GI_VPN_Policy.pdf]]
4: Read and sign the last page of the NIH Data Use Agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
b386f5491991936ecaf2dba7ff6abef59ab3234a
38
37
2018-07-02T18:30:15Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment:
[[Media:GI_VPN_Policy.pdf]]
4: Read and sign the last page of the NIH Data Use Agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
70143398cd03dbe34d1a12d47f8072849f5d04bb
39
38
2018-07-02T18:30:52Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment:
[[GI_VPN_Policy.pdf]]
4: Read and sign the last page of the NIH Data Use Agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
fc002d91ee14da554bb22a432b170229497213de
40
39
2018-07-02T18:31:33Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment:
[[Media:GI_VPN_Policy.pdf]]
4: Read and sign the last page of the NIH Data Use Agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
70143398cd03dbe34d1a12d47f8072849f5d04bb
41
40
2018-07-02T18:35:07Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download:
PDF DOWNLOAD
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup.
06740b57436352cc96bbd072210b884fe676739f
43
41
2018-07-02T18:41:41Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' and Rochelle Fuller (hrfuller@ucsc.edu) requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to copy cluster-admin@soe.ucsc.edu on an email from your PI or supervisor requesting a VPN account for you - this email should include:
Your name
Your PI's name
PI's approval for this access
What other access do you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
dbaf1fd8ebefe74ee1028791303844a27fb779a1
44
43
2018-07-02T18:44:09Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
5bbb9ecbaed585ca22d2ff677e5f88061daf48b0
Public Genomics Institute Infrastructure Ready for Migration
0
10
24
2018-05-25T22:06:19Z
Haifang
1
Created page with " The new Genomics Institute public infrastructure is ready to begin account creation and data migration. It consists of the shared compute server ‘juggernaut’ attached to..."
wikitext
text/x-wiki
The new Genomics Institute public infrastructure is ready to begin account creation and data migration. It consists of the shared compute server ‘juggernaut’ attached to a home directory and file storage server. We will keep the SDSC public ‘kollosus’ server and attached storage available *until June 29th* to allow time for migrating data. We will be adding additional compute after this migration period. We’re working on the private infrastructure (i.e. replacement for pod) and anticipate having it available for migration in the next few weeks with a similar migration window - we will email with updates.
==Getting An Account==
Starting Tuesday May 29th contact Haifang <haifang@ucsc.edu <mailto:haifang@ucsc.edu>> to setup an account on the new system.
Each account includes a backed up home directory initially limited to '''30GB'''.
==Large Data Storage==
If you require more space provide Haifang with the name of a PI or funded project. We will create a shared folder under /public/groups/<PI or project name> that you and others associated with the same PI or project will have access to. If you work with multiple PI’s or projects you will have access to each of their shared folders.
The lab/project may organize data under that folder in any structure, but the overall total size of a pi/project top level folder under groups will be generally limited to '''10TB''' during migration as we get a better idea of how much data all groups require. This storage will be on reliable RAID6 storage but it will not be backed up and should be primarily used for data that you are actively working with. For backup and long term archival we suggest setting up an AWS account and using Glacier.
0589c37d240984d208e8c5d10e17b52c4499e67c
25
24
2018-05-25T22:13:15Z
Haifang
1
wikitext
text/x-wiki
The new Genomics Institute public infrastructure is ready to begin account creation and data migration. It consists of the shared compute server '''courtyard.gi.ucsc.edu''' attached to a home directory and file storage server. We will keep the SDSC public ‘kollosus’ server and attached storage available '''until June 29th''' to allow time for migrating data. We will be adding additional compute after this migration period. We’re working on the private infrastructure (i.e. replacement for pod) and anticipate having it available for migration in the next few weeks with a similar migration window - we will email with updates.
==Getting An Account==
Starting Tuesday May 29th contact Haifang <haifang@ucsc.edu <mailto:haifang@ucsc.edu>> to setup an account on the new system.
Each account includes a backed up home directory initially limited to '''30GB'''.
==Large Data Storage==
If you require more space provide Haifang with the name of a PI or funded project. We will create a shared folder under /public/groups/<PI or project name> that you and others associated with the same PI or project will have access to. If you work with multiple PI’s or projects you will have access to each of their shared folders.
The lab/project may organize data under that folder in any structure, but the overall total size of a pi/project top level folder under groups will be generally limited to '''10TB''' during migration as we get a better idea of how much data all groups require. This storage will be on reliable RAID6 storage but it will not be backed up and should be primarily used for data that you are actively working with. For backup and long term archival we suggest setting up an AWS account and using Glacier.
==Migrating Data==
Once you have an account you can migrate data from the old to the new infrastructure via rsync. For large shared storage please coordinate with others in the lab and/or project. For help with this contact cluster-admin@soe.ucsc.edu
d024e803ff274a39d066104261748dfb9d9422a9
26
25
2018-05-25T22:18:53Z
Haifang
1
wikitext
text/x-wiki
The new Genomics Institute public infrastructure is ready to begin account creation and data migration. It consists of the shared compute server '''courtyard.gi.ucsc.edu''' attached to a home directory and file storage server. We will keep the SDSC public ‘kolossus’ server and attached storage available '''until June 29th''' to allow time for migrating data. We will be adding additional compute after this migration period. We’re working on the private infrastructure (i.e. replacement for pod) and anticipate having it available for migration in the next few weeks with a similar migration window - we will email with updates.
==Getting An Account==
Starting Tuesday May 29th contact the SysAdmin at ''cluster-admin@soe.ucsc.edu'' to setup an account on the new system.
Each account includes a backed up home directory initially limited to '''30GB'''.
==Large Data Storage==
If you require more space provide Haifang with the name of a PI or funded project. We will create a shared folder under /public/groups/<PI or project name> that you and others associated with the same PI or project will have access to. If you work with multiple PI’s or projects you will have access to each of their shared folders.
The lab/project may organize data under that folder in any structure, but the overall total size of a pi/project top level folder under groups will be generally limited to '''10TB''' during migration as we get a better idea of how much data all groups require. This storage will be on reliable RAID6 storage but it will not be backed up and should be primarily used for data that you are actively working with. For backup and long term archival we suggest setting up an AWS account and using Glacier.
==Migrating Data==
Once you have an account you can migrate data from the old to the new infrastructure via rsync. For large shared storage please coordinate with others in the lab and/or project. For help with this contact ''cluster-admin@soe.ucsc.edu''
300ff1429ac166d0ab2be73b77b0cecb35265b69
27
26
2018-05-25T22:27:36Z
Haifang
1
wikitext
text/x-wiki
The new Genomics Institute public infrastructure is ready to begin account creation and data migration. It consists of the shared compute server '''courtyard.gi.ucsc.edu''' attached to a home directory and file storage server. We will keep the SDSC public '''kolossus''' server and attached storage available '''until June 29th''' to allow time for migrating data. We will be adding additional compute after this migration period. We’re working on the private infrastructure (i.e. replacement for pod) and anticipate having it available for migration in the next few weeks with a similar migration window - we will email with updates.
==Getting An Account==
Starting Tuesday May 29th contact the SysAdmin at ''cluster-admin@soe.ucsc.edu'' to setup an account on the new system.
Each account includes a backed up home directory initially limited to '''30GB'''.
==Large Data Storage==
If you require more space provide the admin with the name of a PI or funded project. We will create a shared directory under /public/groups/<PI or project name> that you and others associated with the same PI or project will have access to. If you work with multiple PI’s or projects you will have access to each of their shared directories.
The lab/project may organize data under that directory in any structure, but the overall total size of a pi/project top level directory under groups will be generally limited to '''10TB''' during migration as we get a better idea of how much data all groups require. This storage will be on reliable RAID6 storage but it will not be backed up and should be primarily used for data that you are actively working with. For backup and long term archival we suggest setting up an AWS account and using Glacier.
==Migrating Data==
Once you have an account you can migrate data from the old to the new infrastructure via ''rsync''. For large shared storage please coordinate with others in the lab and/or project. For help with this contact ''cluster-admin@soe.ucsc.edu''
76dcaba63d6a9997864dbcb97b5f8334d781ea8f
How to access the public servers
0
11
29
2018-06-15T19:25:17Z
Haifang
1
Created page with "To Genomic Institute public server is courtyard.gi.ucsc.edu To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. Please provide your full na..."
wikitext
text/x-wiki
To Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. Please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin and setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/''groups/name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
ea9f8d1a2a61628a49e12a09a7773a598c2be6c2
30
29
2018-06-15T19:26:08Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. Please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin and setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/''groups/name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
85df174d6e6d34ef3d7aa8d10eb43245bd6a9614
31
30
2018-06-15T19:31:41Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
d258ebf4ed8d9cfc7427ee0940981fdfe75f38d3
32
31
2018-06-21T21:16:34Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about rsync [https://linux.die.net/man/1/rsync / here].
ba1fe9d9eab2398663104cc47036182ddf62aece
33
32
2018-06-21T21:17:16Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about rsync [https://linux.die.net/man/1/rsync here].
89e2fcb519ebb0c01049a111a3bf8369321fdf08
34
33
2018-06-21T21:18:13Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about '''''rsync''''' [https://linux.die.net/man/1/rsync here].
a6339e528c39261599e49d13f7e2d4be75b20bb8
File:GI VPN Policy.pdf
6
12
36
2018-07-02T18:26:49Z
Weiler
3
Genomics Institute VPN User Policy
wikitext
text/x-wiki
Genomics Institute VPN User Policy
1497b06cd4180a84d9994df5b17c3083121becc1
File:NIH GDS Policy.pdf
6
13
42
2018-07-02T18:36:19Z
Weiler
3
NIH Genomic Data Sharing Policy
wikitext
text/x-wiki
NIH Genomic Data Sharing Policy
eee54c0fdb7c55dfb2ccbefbf2c828ac89b713aa
Genomic Institute Computing Information wiki
0
14
46
2018-07-02T18:47:07Z
Weiler
3
Weiler moved page [[Genomic Institute Computing Information wiki]] to [[Genomics Institute Computing Information]]
wikitext
text/x-wiki
#REDIRECT [[Genomics Institute Computing Information]]
ebdb8ea87497b6dd8a6b0567cb5fc43c9f4eaa71
Genomic Institute
0
15
49
2018-07-02T18:49:17Z
Weiler
3
Weiler moved page [[Genomic Institute]] to [[UC Santa Cruz Genomics Institute]]
wikitext
text/x-wiki
#REDIRECT [[UC Santa Cruz Genomics Institute]]
1b88a6d5b11ca1e78e4548b24998292003ae8f7d
Main Page
0
1
52
51
2018-07-02T18:53:52Z
Weiler
3
wikitext
text/x-wiki
Welcome to the UC Santa Cruz Genomics Institute Information Wiki! Below are dashboards for various Information Repositories related to the Genomics Institute.
[[Genomics Institute Computing Information]]
dea55e8bade97fdb236ca5f399006f8c143d6c93
MediaWiki:Sidebar
8
5
53
14
2018-07-02T18:55:17Z
Weiler
3
wikitext
text/x-wiki
* navigation
** mainpage|mainpage
** portal-url|portal
** recentchanges-url|recentchanges
** helppage|help
** sitesupport-url|sitesupport
9876c759155fff7261a6e50108ea95cb5cd4dc00
73
53
2018-07-13T01:34:45Z
Weiler
3
wikitext
text/x-wiki
* navigation
** mainpage|Genomics Institute Computing Information
** portal-url|portal
** recentchanges-url|recentchanges
** helppage|help
** sitesupport-url|sitesupport
f8ffdf1b795b4b243d05fdfc99f15bbafe6c5772
74
73
2018-07-13T01:35:16Z
Weiler
3
wikitext
text/x-wiki
* navigation
** Genomics Institute Computing Information|Genomics Institute Computing Information
** portal-url|portal
** recentchanges-url|recentchanges
** helppage|help
** sitesupport-url|sitesupport
09f03b50be4898802f0d25637a69fccdb21c254d
Genomics Institute Computing Information
0
6
54
45
2018-07-02T18:55:56Z
Weiler
3
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
7878c6737bd8fb74eba3ea7611671f29e68fc7fd
61
54
2018-07-12T20:25:03Z
Haifang
1
/* VPN Access */
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
cfac32fefae10b55a663ba91c2536dc0791a8e13
65
61
2018-07-13T00:52:38Z
Weiler
3
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
*[[data storage resources]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
3fc8bb432896831f8be85e1da3341e1a2e9652ae
70
65
2018-07-13T01:25:35Z
Weiler
3
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
9d8e97ccec7a7346e9b934b187516faab2ea7dd7
71
70
2018-07-13T01:26:02Z
Weiler
3
wikitext
text/x-wiki
Genomic Institute Computing Information Repository
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
96ad8af7b2f6015a5ba360b0dbf28a81bed9efbe
76
71
2018-07-13T01:38:58Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
183d2ee451814257505934b3c92a3f208b6c806e
86
76
2018-07-23T16:38:41Z
Haifang
1
/* VPN Access */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Instructions and Requirements for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
f0d0d371d06ce5909d9c62b26a4af08bf52d3589
87
86
2018-07-23T16:40:12Z
Haifang
1
Undo revision 86 by [[Special:Contributions/Haifang|Haifang]] ([[User talk:Haifang|talk]])
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Requirement for users to get POD VPN access]]
183d2ee451814257505934b3c92a3f208b6c806e
97
87
2018-08-22T17:45:34Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
954b9433b7efb40db7695cf359e7a66fb317f6c9
98
97
2018-08-27T16:24:50Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
ce96786aa5e923fd5c322ff3ae42003036f604f0
101
98
2018-09-19T16:12:49Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of '''giCloud''' in the Genomics Institute]]
05d49d8b73506fc70ef7fcfccfdc8ec5cd7cdf90
Requirement for users to get GI VPN access
0
9
55
44
2018-07-02T18:58:28Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
6734d25dca1b77f5cd2eaeb50b575cc9ac611093
56
55
2018-07-02T19:14:30Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
465a8dae859bc24cb409ff729697445938a9aa65
57
56
2018-07-02T22:59:23Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
6f45a8b6eb5306dcf5d56d244fdc15b9b3fb7448
60
57
2018-07-12T20:20:58Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to POD, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide
your name
your PI's name
PI's approval for this access (an email from your PI will be fine)
what other access do you need such as an unix server account or to the OpenStack.
Before your appointment, please make sure you have the following:
A laptop running OS X, Windows or Ubuntu
wireless connection to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instruction on how to setup '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html.
'''For Macs''', please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
'''For Windows''', please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
'''For Ubuntu''', please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
2dc214accc84efed700fed576489a053e3ae9202
62
60
2018-07-12T20:25:49Z
Haifang
1
Reverted edits by [[Special:Contributions/Haifang|Haifang]] ([[User talk:Haifang|talk]]) to last revision by [[User:Weiler|Weiler]]
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
6f45a8b6eb5306dcf5d56d244fdc15b9b3fb7448
78
62
2018-07-16T19:20:30Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
1: You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
2: You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
3: You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
4: Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
5: You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
6: Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
c12039b2efb795068bfb3f2eccc64f7b224edae4
79
78
2018-07-16T20:34:39Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have complete all '''six''' requirements explained in details below.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Singed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Install the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
6e08d332f90539f1a8d4b83021d386f3f5faa469
80
79
2018-07-16T20:35:45Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have complete all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Singed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Install the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
28b08dfd0a750debb5ab447e808241f72e12edab
81
80
2018-07-16T20:37:51Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have complete all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Singed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
7a5147c871d5630a1b39f4507bb10cf8b0719445
82
81
2018-07-17T04:15:00Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have complete all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
6b0477cef9231e24b18bdf5c8d66f554e74aceed
83
82
2018-07-17T04:15:33Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
672c75cf7959d15416011fa2bd5765ca74122d15
88
83
2018-07-24T20:47:45Z
Weiler
3
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
a411d3b421b35dd14f76cd58dbf6f3765243fbb5
How to access the public servers
0
11
58
34
2018-07-06T22:38:22Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about '''''rsync''''' [https://linux.die.net/man/1/rsync here].
If you want to setup a web page on courtyard, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/~''username''/
c1618366c5c9613b40161e613e2423e8fa08dd12
59
58
2018-07-06T22:39:54Z
Haifang
1
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 10TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about '''''rsync''''' [https://linux.die.net/man/1/rsync here].
If you want to setup a web page on courtyard, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
fcfb028afb04a0fa9f25709e0c2e298a371043f1
64
59
2018-07-13T00:50:58Z
Weiler
3
wikitext
text/x-wiki
The Genomic Institute public server is courtyard.gi.ucsc.edu
To access the server, first request an account by emailing cluster-admin@soe.ucsc.edu. If you already have an account on kolossus, please provide your full name, your username on the server kolossus and the name of your PI and the lab or project you are working with. If you are affiliated with Genomic Institute, but do not have an account on kolossus, please make an appointment with the SysAdmin to setup your account in person.
You can also request to be added to a group. A group for a PI is named as ''PI's_lastname_lab'' or the name of a project such as ''treehouse''.
After you get the account, you can login by typing:
ssh ''your_username''@courtyard.gi.ucsc.edu
Your home directory path is /public/home/''your_username''. Your home directory quota is 30GB.
A group's directory's path is /public/groups/''name_of_the_group'' You can create your directory under there. You share the disk space with your group mates. The quota for your group is 15TB.
You can use '''''rsync''''' to copy files from kolossus.sdsc.edu to courtyard.gi.ucsc.edu. If needed, you can read about '''''rsync''''' [https://linux.die.net/man/1/rsync here].
If you want to setup a web page on courtyard, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
3c7a110a92b296eec48ccfed25ac132776ea99a7
69
64
2018-07-13T01:25:05Z
Weiler
3
wikitext
text/x-wiki
== Server Types and Management==
You can ssh into our public computer server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GBTB local scratch space
This server is running CentOS 7.5 Linux. It is managed by the Genomics Institute Cluster Admin group. If you need software installed on it, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
7dfab89fadefe8e9b42c148605273f503804c98c
84
69
2018-07-20T20:25:49Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public computer server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GBTB local scratch space
This server is running CentOS 7.5 Linux. It is managed by the Genomics Institute Cluster Admin group. If you need software installed on it, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
b527f3f746cc7badcb5f9e3512743957ab3546db
85
84
2018-07-20T20:26:08Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public computer server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
This server is running CentOS 7.5 Linux. It is managed by the Genomics Institute Cluster Admin group. If you need software installed on it, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
4f6d0cfeab69dd58235df453256675bf37b9c1b6
89
85
2018-07-31T20:56:49Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public computer server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
This server is running CentOS 7.5 Linux. It is managed by the Genomics Institute Cluster Admin group. If you need software installed on it, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
3ebf4b9a7668a4a46cf71230da7e6bb5e25def03
91
89
2018-08-03T22:03:49Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public computers server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
490762d89796f20be68f8408538846f74fbf8883
92
91
2018-08-03T22:04:15Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public computers server via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
40fe46b049e16a789b963dcc7b7365b2fad8b536
93
92
2018-08-05T03:36:49Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a13fb108de87de45fcb9361e680241b2e0274570
94
93
2018-08-05T03:52:11Z
Weiler
3
/* /scratch Space on the Servers */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
5b16062515e3b8d168e54d25d40c4a40fe0e0868
Requirement for users to get POD VPN access
0
16
63
2018-07-12T20:36:13Z
Haifang
1
Created page with "If you need VPN access to POD, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide your name your P..."
wikitext
text/x-wiki
If you need VPN access to POD, please make an appointment with the SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu''. In this email please provide
your name
your PI's name
PI's approval for this access (an email from your PI will be fine)
what other access do you need such as an unix server account or to the OpenStack.
Before your appointment, please make sure you have the following:
A laptop running OS X, Windows or Ubuntu
wireless connection to '''eduroam'''. cruznet cannot connect to the VPNs. You can find the instruction on how to setup '''eduroam''' at https://its.ucsc.edu/wireless/eduroam-config.html.
'''For Macs''', please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Stable version.
'''For Windows''', please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
'''For Ubuntu''', please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn
if that fails, please also install the following:
sudo apt-get install network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
99d84f5b55aa8251835b49455d2aa95faf46c17b
Access to the Firewalled Compute Servers
0
17
66
2018-07-13T01:17:14Z
Weiler
3
Created page with "Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here: [[How to access the public servers]] == Server Types and Man..."
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[How to access the public servers]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
5b5e99443940b0544cd42d5bd2aa21a74a231166
67
66
2018-07-13T01:17:39Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[How to access the public servers]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
f006432397784c13149ca2cbb7c7fbcb4869dc97
68
67
2018-07-13T01:18:11Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[How to access the public servers]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
d76e16f7391160ed97280f5342c7386264493213
72
68
2018-07-13T01:27:08Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
25c985b8c78d1663f7be005146d1b4c9e2fe7a08
77
72
2018-07-14T15:48:42Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
We will add another compute server later on that will have 1TB RAM, 64 cores and several TB of local scratch, but not for a while.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
c8ba410fbfec4e9b05f3d3af0999675f5a1ed24d
90
77
2018-07-31T23:01:51Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
We will add another compute server later on that will have 1TB RAM, 64 cores and several TB of local scratch, but not for a while.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
b56b69c466a8963da725963a0fa995f5835553c2
95
90
2018-08-09T15:59:32Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
We will add another compute server later on that will have 1TB RAM, 64 cores and several TB of local scratch, but not for a while.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
4de8fc4ad6e5246532d9599c7b4ea5e8faede7a0
96
95
2018-08-10T21:12:57Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
d2232a217d2b7758ec166fe4c0f513bafdaee6c9
MediaWiki:Mainpage
8
18
75
2018-07-13T01:37:59Z
Weiler
3
Created page with "Genomics Institute Computing Information"
wikitext
text/x-wiki
Genomics Institute Computing Information
8f0db61c10b172b3645b34f3bf4d089ea852dcd1
Requirements for dbGaP Access
0
19
99
2018-08-27T16:41:23Z
Weiler
3
Created page with "If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If yo..."
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Rochelle know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email Rochelle Fuller (hrfuller@ucsc.edu) requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to Rochelle Fuller. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
We will correspond with you via email on when the appointment will be - correspond with Rochelle about getting everything set up! (hrfuller@ucsc.edu)
218eb73401754b2e05eac643dc1fc5496f5eeaca
100
99
2018-08-27T16:41:58Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Rochelle know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''Rochelle Fuller (hrfuller@ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to Rochelle Fuller. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
We will correspond with you via email on when the appointment will be - correspond with Rochelle about getting everything set up! ('''hrfuller@ucsc.edu''')
fef167db1b5ba1278c31babd35b585735d48cb32
Genomics Institute Computing Information
0
6
102
101
2018-09-19T16:13:19Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
59f7222b59c9ea10867001860c31462dddece5bd
105
102
2019-02-06T18:06:57Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
d3a25c8f94a38b3a589811e5ded73de151cfceea
116
105
2019-02-06T23:36:36Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
481da0d869c8d012f505e4b48dcd3e4787d9dfc7
143
116
2019-09-04T19:09:10Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
d5e7dde13c3caf636be0f5a10f1c1cfed1f342cf
Overview of giCloud in the Genomics Institute
0
20
103
2018-09-27T18:00:57Z
Weiler
3
Created page with "'''giCloud''' is the Genomics Institute implementation of OpenStack. OpenStack is a IaaS (Infrastructure as a Service) based platform in which you can launch VM instances in..."
wikitext
text/x-wiki
'''giCloud''' is the Genomics Institute implementation of OpenStack. OpenStack is a IaaS (Infrastructure as a Service) based platform in which you can launch VM instances in a cloud environment. More about OpenStack can be found here:
https://www.openstack.org
Our particular implementation of OpenStack is located behind the GI VPN service, and as such, you cannot launch VM instances to provide "public" services available on the Greater Internet. The VM instances you create are meant for testing software or processing data pipelines. The instances are '''not backed up''' and should be treated as such. They have access outbound to the Greater Internet, such that you can download data from the Internet and install software from the Internet, etc, but no one originating from the Internet outside of the VPN can see your instances.
The instances you create are also meant for the processing of secure data, hence being behind the VPN, and also the disks are encrypted to provide an additional layer of physical security to satisfy certain requirements of FISMA and HIPPA.
Once you connect to the GI VPN, you can access the web console for giCloud here:
http://gicloud.prism
Note that the connection is over http and not https. This is expected and normal, and even though your web browser may complain about the connection not being encrypted, it actually '''IS''' encrypted due to the VPN software. But your browser isn't aware of the VPN connection.
You can get access to the GI VPN service if you are affiliated with the UCSC Genomics Institute. If you need VPN access, please fulfill the requirements as detailed here:
http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access
After you have VPN access, make an appointment with the Genomics Institute Cluster Admin group by emailing 'cluster-admin@soe.ucsc.edu' and we can set you up with a giCloud account.
f635e625d8794072e592bc71f751235503e28168
Requirements for dbGaP Access
0
19
104
100
2018-11-09T18:37:45Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Rochelle know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''Rochelle Fuller (hrfuller@ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to Rochelle Fuller. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
We will correspond with you via email on when the appointment will be - please email Rochelle about getting everything set up! ('''hrfuller@ucsc.edu''')
56169cb853baa33b35fd78d1c21cbf8b7133b38e
Overview of Getting and Using an AWS IAM Account
0
21
106
2019-02-06T18:21:00Z
Weiler
3
Created page with "__TOC__ == Getting Amazon Web Services Access == The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated wit..."
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS. You can change your password by:
aa7aba52293b71c8a5220bb2b08a7327f73bff8a
107
106
2019-02-06T18:31:47Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you can switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll downt o the MFA (Multi-Factor Authentication) section of the page, and click '''"Manage MFA Device"'''.
4a70d1bf084a6ef500992cc91fd58d3d41525d14
108
107
2019-02-06T18:42:56Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
4b4de1d069ffd06995dbaff06124caed9b4b93a0
109
108
2019-02-06T18:43:37Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
648fcf9c7470f6991282bf7aef93e696c1df4852
110
109
2019-02-06T18:55:56Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button. If all went well you should be dumped into the 'pangenomics' account, and you should be
identified in the top righthand corner of the page as '''"develop @ pangenomics"''', indicating your role and the account
you are active in. You can then work as normal in that account.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
fda0a9d1700cf84879b08ee25d20f03bf6280e43
111
110
2019-02-06T19:05:40Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"develop @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
81395e129f3944bc0faaf1611c2ecb1dcfe82c66
112
111
2019-02-06T19:47:54Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"develop @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
2d33be833dafb25c2311beb9c2e146b89abb84a9
113
112
2019-02-06T19:48:48Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
b9cbf7525a3bc394eeca792e4ba83fb112d63cfb
114
113
2019-02-06T22:24:48Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers HERE, find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for
c7a676a40e18f571c5cfeeeb048aea0179c2f16f
115
114
2019-02-06T23:35:48Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers HERE, find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
e4a7326a7650ca3e3a6eaaa0e90375ee8281db32
119
115
2019-02-07T00:15:08Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with your AWS Account, log out, then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so.
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
bf758ffcdea566f910f0a7f218a4d3b8c72498d6
120
119
2019-02-07T00:26:41Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi-Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA, the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
dea266938d290c61fbe368c31955ab70ac2658af
121
120
2019-02-07T18:45:30Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"develop @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
e95b2fc881afeb558e34986578cb22f082dd6760
122
121
2019-02-07T18:47:13Z
Weiler
3
/* Switching Roles into Another AWS Account */
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you will see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
41cad9b0af34b758e4147272e88db05616f2012e
123
122
2019-02-07T18:49:46Z
Weiler
3
/* Getting Amazon Web Services Access */
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mda_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
24f088b22a7e64aea8e5c45818ce4678c1f9e227
124
123
2019-02-07T18:50:38Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role-arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
dbe617d72e95782abc2fcb36cb626525908e5740
125
124
2019-02-07T18:51:05Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting Amazon Web Services Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
59b0738ec9ddc5a5eab6377f68270f5dd77b90eb
127
125
2019-02-07T18:55:06Z
Weiler
3
/* Getting Amazon Web Services Access */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
de18dcbdf677a940637f81cae2dbd52a96532dc2
128
127
2019-02-08T18:51:33Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
4eec076d42d84559a33730062190eda9ade28523
129
128
2019-02-08T21:13:16Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours '''while you are using that same shell session'''. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours in your current shell.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
a9cd1dd1832749cccccbe9406103c8cbbdec2375
130
129
2019-02-08T21:14:30Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here: [[AWS Account List and Numbers]]. Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
b2189eff93e229dffb577f0aed2950ccaf379ce8
131
130
2019-02-08T21:27:36Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid by default for one hour, so you can run other 'aws' cli commands for one hour without the need to re-authenticate with MFA. After one hour, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
e10a0de7a94a76feebabe12d64148671d555f3db
132
131
2019-02-08T21:29:52Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
You can extend your session length from one hour to twelve hours but utilizing the AWS Security Token Service (AWS STS). See this page for more information on how to do this:
https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html#examples
The examples at the bottom are particularly useful.
bf2ff2408fbe39c523e9a8980adca855871e835f
133
132
2019-02-08T21:30:35Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The 'role_arn' line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be '652235167018' because that is the account number of the top level 'gi-gateway' account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
383f386fe5cb5c240bfa4025a55bbe998b807adc
134
133
2019-02-08T22:05:40Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"aws"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
3ef08e4f6d1fa4ec5fe8972e912c0a642ead2698
135
134
2019-02-08T22:06:42Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"awscli"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
48e248560f22bcbc32aa1f44a63873d48eed0004
141
135
2019-04-26T20:13:37Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"awscli"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
It should be noted that we recommend awscli version 1.16.x or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
79012bcb0fc944278e1c1bbea0f9cf6490475f5d
142
141
2019-06-26T22:39:00Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"awscli"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
d0950e15e6ec5728bf50b70cb48a107610ad2219
147
142
2019-09-04T19:41:39Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"awscli"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated with 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
5a64822dd5263b07927a1ea0d100f7c75684f0f5
AWS Account List and Numbers
0
22
117
2019-02-06T23:58:37Z
Weiler
3
Created page with "This is a list of our currently available AWS accounts and their account numbers: Production-BD2K : 862902209576 ucsc-toil-dev : 318423852362 ucsc-vg-d..."
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
Production-BD2K : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
treehouse : 238605363322
03e6908796cefe7d0d98481fd88b46928a075ee7
118
117
2019-02-06T23:59:19Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
treehouse : 238605363322
fa42cb40158235fc7e35f2978e17528088a26711
126
118
2019-02-07T18:53:35Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
a6c3acf0bd0db90173fda5167918258e6a6504bc
140
126
2019-04-11T16:55:56Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
b30c8c8af6dbcf8c00fc4a5279acb93758a128f4
How to access the public servers
0
11
136
94
2019-02-13T22:56:11Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a7c98b49bb41344d0712ccd887ee72a17ccc1138
138
136
2019-02-22T22:47:56Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please ask your PI or sponsor to email 'cluster-admin@soe.ucsc.edu' requesting that you be granted access. Then we can set up a quick meeting to create your account and go over the details.
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
bbea13e90bf480a3b2d58a08d6a467e4ceb8bc95
Access to the Firewalled Compute Servers
0
17
137
96
2019-02-13T22:57:10Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
048aca92cafc31999caeff93677820cac67c3ebb
139
137
2019-02-22T22:48:53Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
5723f270e178f3243cb2b7c8ffab7d2c9e38948c
Computational Genomics Kubernetes Installation
0
23
144
2019-09-04T19:24:57Z
Weiler
3
Created page with "__TOC__ The Computation Genomics Group has a Kubernetes Cluster running on several large instances in AWS. ==Getting Authorized to Connect== If you require access to this k..."
wikitext
text/x-wiki
__TOC__
The Computation Genomics Group has a Kubernetes Cluster running on several large instances in AWS.
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate, it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity=
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
c3b3cfb0dd4dbb59ec563e48f09d09e5a6f8efe9
145
144
2019-09-04T19:25:14Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computation Genomics Group has a Kubernetes Cluster running on several large instances in AWS.
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate, it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
9d8628dddd89172f96bbe8ef6ecc16d3e4ef67d6
146
145
2019-09-04T19:36:26Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computation Genomics Group has a Kubernetes Cluster running on several large instances in AWS.
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate, it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
cb96f82367c5ab8376d3557ed3562b22ae59f92c
148
146
2019-09-04T19:59:25Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computation Genomics Group has a Kubernetes Cluster running on several large instances in AWS.
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
69347e4555c71368613732e777d6c8e96fa7f204
149
148
2019-09-04T20:05:28Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
86f07d928a5f4753b8523af8e78a9a715032ae11
150
149
2019-09-04T22:28:15Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods).
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "30"
memory: "30G"
limits:
cpu: "31"
memory: "40G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
c80c71a5be74e754a7d14c275acebc45f6d46973
151
150
2019-09-04T22:34:52Z
Weiler
3
/* Running Pods and Jobs */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods).
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
5d12ec043957203462f7c08831070a9c8b2239e1
Computational Genomics Kubernetes Installation
0
23
152
151
2019-09-04T22:36:45Z
Weiler
3
/* Running Pods and Jobs */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
617de74c11f1c305b5a24e325495ac1e0b485d23
153
152
2019-09-06T19:38:16Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that run over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
b10c3696104743ef494cc61710d675a259166002
154
153
2019-09-06T19:39:03Z
Weiler
3
/* Running Pods and Jobs with Requests and Limits */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
db25afc00f8eaa28404be30737ddb4ee47050a02
155
154
2019-09-11T23:16:47Z
Weiler
3
/* Authenticating to Kubernetes */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
460c4ca4a5f1e864bc8948966d8a83eddde41ad0
156
155
2019-09-14T17:03:49Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 60m 0% 655Mi 0%
k2.kube 51m 0% 625Mi 0%
master.kube 97m 4% 1913Mi 93%
That means the worker nodes, k1 and k2, are using 0% memory and 0% CPU and are basically fully open for jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
2ee88d0c17a3cfdb32e92791304b12a59ca322d3
158
156
2019-09-23T17:20:20Z
Weiler
3
/* Testing Connectivity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 60m 0% 655Mi 0%
k2.kube 51m 0% 625Mi 0%
master.kube 97m 4% 1913Mi 93%
That means the worker nodes, k1 and k2, are using 0% memory and 0% CPU and are basically fully open for jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
0699d99ce4a8db645e9f417c02a858a46b078567
159
158
2019-09-23T17:23:02Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
92abbb78312957a95257d43db65685f2f46580b1
160
159
2019-09-23T18:11:17Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
851b38228b7e561dc5fb6491865e06f7d5c1205b
161
160
2019-09-23T19:40:55Z
Weiler
3
/* Running Pods and Jobs with Requests and Limits */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "2"
memory: "3G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
'''NOTE:''' Jobs and pods that have '''completed''' over 48 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 48 hours will not be deleted, only the ones that have '''exited''' over 48 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
a4c12c781a54552d4f1eb24421c83d0b201b9ada
163
161
2019-12-02T23:32:26Z
Weiler
3
/* Running Pods and Jobs with Requests and Limits */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
a5ab0e05854b1075c5142967647081eb38c235cf
164
163
2019-12-06T21:12:33Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this(long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
172607f3053ab16dcd66cf09d8cce4fb08a89e3a
165
164
2019-12-06T21:12:58Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
7ae2d0b83584b006e70ab81949dd28ca08cc8d44
177
165
2020-02-05T22:34:40Z
Weiler
3
/* Running Pods and Jobs with Requests and Limits */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
6120c01ff0535d179f5078d8447ffb35b12597f0
178
177
2020-03-13T22:15:58Z
Anovak
4
Note that limits can replace requests if unspecified.
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster. If you omit the "requests" section altogether, the limit values will be used, so if you use only one, use "limits".
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
06d1d9162c5d6b73723bb5eb2644e5cf0113d333
179
178
2020-03-13T22:27:16Z
Anovak
4
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster. If you omit the "requests" section altogether, the limit values will be used, so if you use only one, use "limits".
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
== Inlining Jobs in Shell and Shell in Jobs ==
When interactively developing on Kubernetes, it can be useful to be able to have a shell command you can copy and paste to run a Kubernetes job, rather than having to create YAML files on disk. Similarly, it can be useful to have shell scripting inline in your Kubernetes job definitions, rather than having to bake your experimental script into a Docker container. Here's an example that does both, putting the YAML inside a heredoc and putting the script to run in the container inside a multiline YAML string. We precede this with a command to delete the job, so you can modify your script and re-paste it to replace a failed or failing job. We also make sure to mount the AWS credentials in the container, so that the ''aws'' command will be able to access S3 if you install it.
kubectl delete job username-job
kubectl apply -f - <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli cowsay
cowsay "Listing files"
aws s3 ls s3://vg-k8s/
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
EOF
Make sure to replace "username-job" with a unique job name that includes ''your'' username.
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
75c0e69be43a28e0f88ad02b2d5bd309cc6132c5
180
179
2020-03-13T22:49:05Z
Anovak
4
Talk about perf and NUMA issues when profiling
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster. If you omit the "requests" section altogether, the limit values will be used, so if you use only one, use "limits".
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==Inlining Jobs in Shell and Shell in Jobs==
When interactively developing on Kubernetes, it can be useful to be able to have a shell command you can copy and paste to run a Kubernetes job, rather than having to create YAML files on disk. Similarly, it can be useful to have shell scripting inline in your Kubernetes job definitions, rather than having to bake your experimental script into a Docker container. Here's an example that does both, putting the YAML inside a heredoc and putting the script to run in the container inside a multiline YAML string. We precede this with a command to delete the job, so you can modify your script and re-paste it to replace a failed or failing job. We also make sure to mount the AWS credentials in the container, so that the ''aws'' command will be able to access S3 if you install it.
kubectl delete job username-job
kubectl apply -f - <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli cowsay
cowsay "Listing files"
aws s3 ls s3://vg-k8s/
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
EOF
Make sure to replace "username-job" with a unique job name that includes ''your'' username.
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
==Profiling with Perf==
You can use Linux's "perf" to profile your code on the Kubernetes cluster. Here is an example of a job that does so. You need to obtain a "perf" binary that matches the version of the kernel that the Kubernetes ''hosts'' are running, which most likely does not correspond to any version of "perf" available in the Ubuntu repositories. Here we download a binary previously uploaded to S3. Also, the Kubernetes hosts have '''Non-Uniform Memory Access (NUMA)''': some physical memory is "closer" to some physical cores than to toher physical cores. The system is divided into '''NUMA nodes''', each containing some cores and some memory. Memory access from a node to its own memory is significantly faster than memory access from a node to other nodes' memory. For consistent profiling, it is important to restrict your application to a single NUMA node if possible, with "numactl", so that all accesses are local to the NUMA node. If you don't do this, your application performance will vary arbitrarily depending on whether and when threads are scheduled on the different NUMA nodes of the system.
apiVersion: batch/v1
kind: Job
metadata:
name: username-profiling
spec:
ttlSecondsAfterFinished: 1000
template:
metadata: # Apply a lable saying that we use NUMA node 0
labels:
usesnuma0: "Yes"
spec:
affinity: # Say that we should not schedule on the same node as any other pod with that label
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: usesnuma0
operator: In
values:
- "Yes"
topologyKey: "kubernetes.io/hostname"
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli numactl
# Use this particular perf binary that matches the hosts' kernels
# If it is missing or outdated, get a new one from Erich or cluster-admin
aws s3 cp s3://vg-k8s/users/adamnovak/projects/test/perf /usr/bin/perf
chmod +x /usr/bin/perf
# Do your work with perf here.
# Use numactl to limit your code to NUMA node 0 for consistent memory access times
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 24 # One NUMA node on our machines is 24 cores.
memory: "150Gi"
ephemeral-storage: "400Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
9a63752600bf3e89761ca54c13605b6b2b0b9e01
Overview of Getting and Using an AWS IAM Account
0
21
157
147
2019-09-19T20:29:00Z
Weiler
3
/* Tag Your Resources */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
When using the '''"awscli"''' command line tool, assuming you have it installed (the process of which is outside the scope of this document), you would use the steps outlined in this document to configure it:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
That document has a lot of other really useful information in it - if you plan on using keys for API access, we highly recommend reading it through.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
f46e9180a211296084ffb91c5cb5422b23ae02b2
175
157
2020-01-29T00:23:24Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this:
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
d4cef684ad25a9e0af9425b79d020c8ece4fc600
176
175
2020-01-29T00:28:29Z
Weiler
3
/* API Access and Secret Keys */
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. You may want to configure something like this:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/bill@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
c0c0a858e05697c8ad3a1e67d55c1a810d53999b
AWS Account List and Numbers
0
22
162
140
2019-11-04T17:43:51Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
a85670292a54a2f7559175e958b3a0f1130a670b
173
162
2020-01-22T18:54:35Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-sc : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
91991c226298d664984773fb3188ddd34ac3d759
Requirement for users to get GI VPN access
0
9
166
88
2020-01-03T22:45:44Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please make an appointment with the GI SysAdmin team by emailing ''cluster-admin@soe.ucsc.edu'' requesting access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
84c3ef0009a0cae7a00957660f179551609fd939
Genomics Institute Computing Information
0
6
167
143
2020-01-14T22:44:10Z
Weiler
3
/* Kubernetes Information */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Diseaase Project Kubernetes Installation]]
d6b12baa2020029b3e8f1937189280c193fa3161
169
167
2020-01-14T22:51:12Z
Weiler
3
/* Kubernetes Information */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==Datacenter Migration==
*[[Public Genomics Institute Infrastructure Ready for Migration]]
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
967c26d84e22d95f9c460e4c208c532ce808677c
174
169
2020-01-23T17:16:54Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
645705ee1a12937901e139de5305890bac22bd4d
181
174
2020-04-28T16:58:55Z
Weiler
3
/* giCloud Openstack */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
1c117b5185b4177c9449dd1b7265af5ba7e0474b
Undiagnosed Diseaase Project Kubernetes Installation
0
24
168
2020-01-14T22:50:27Z
Weiler
3
Created page with "__TOC__ The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the followin..."
wikitext
text/x-wiki
__TOC__
The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the following specs:
* 72 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.2 TB Local NVMe Flash Storage
* 4 NVIDIA GPUs
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
'''NOTE:''' You need GI VPN Access to access this kubernetes installation.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://udp-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://udp-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://udp-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
udp-k8s-1 1815m 1% 1191Mi 0%
udp-k8s-master 111m 5% 1024Mi 46%
Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
35b4d6f38a88cc371e8933e24f763396e22b36a9
Undiagnosed Disease Project Kubernetes Installation
0
25
170
2020-01-14T22:51:33Z
Weiler
3
Created page with "__TOC__ The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the followin..."
wikitext
text/x-wiki
__TOC__
The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the following specs:
* 72 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.2 TB Local NVMe Flash Storage
* 4 NVIDIA GPUs
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
'''NOTE:''' You need GI VPN Access to access this kubernetes installation.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://udp-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://udp-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://udp-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
udp-k8s-1 1815m 1% 1191Mi 0%
udp-k8s-master 111m 5% 1024Mi 46%
Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
35b4d6f38a88cc371e8933e24f763396e22b36a9
171
170
2020-01-14T23:08:29Z
Weiler
3
wikitext
text/x-wiki
__TOC__
The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the following specs:
* 72 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.2 TB Local NVMe Flash Storage
* 4 NVIDIA GPUs
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
'''NOTE:''' You need GI VPN Access to access this kubernetes installation.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://udp-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://udp-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://udp-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
udp-k8s-1 1815m 1% 1191Mi 0%
udp-k8s-master 111m 5% 1024Mi 46%
Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6ImY3MmxpdUQzb1dSUUp0NlFVUlI1blJVaUItX1pUX2JJMGRhY0ZOc3B6MDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi14OG52ZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjAwZDQ1ZTI1LWU5MmMtNDMxZC1iYjE2LWQ3ZWViZTRkNDgzMiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.Ty9vSYU59_uqBar9OSc0wlenhGm1-aSUCPb8SZf6nhE8VVSt4TPCrXeL2SEsI_u6JAEeOBVJvVof52XSoU84RM8-e3ZWmr57LfjlEh5tPyJXPijCR_x3K0fXV-vpUUV69s7PHoLIy8UaoXOGbxm0O_731fnMenNtNbDDiWXjW9mXhklUG9mxDEipfKW76B_ZmuEkYuAP6BiNPuYc1K6x3m5x4QpkLe3MhBi0tTCkG5q1RU8S63FE3deRcl7VVvGoCENPq9vMJpOEqsVDBotEGwOca4UG7cCOeSSHwOz2aLHeP0CXZehWp9d7GggnrnknKJHrtXZ7-WiIABSPe2GJow
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
f68c14020cb59112a94e69a814b554804585416d
172
171
2020-01-14T23:50:11Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Undiagnosed Disease Project (UDP) has a Kubernetes Cluster running on one large GPU server. The current cluster makeup includes one worker node with the following specs:
* 72 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.2 TB Local NVMe Flash Storage
* 4 NVIDIA GPUs
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
'''NOTE:''' You need GI VPN Access to access this kubernetes installation.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://udp-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://udp-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://udp-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your jib from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
udp-k8s-1 1815m 1% 1191Mi 0%
udp-k8s-master 111m 5% 1024Mi 46%
Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://udp-k8s-dashboard.gi.ucsc.edu/
Note that you need to be connected to the VPN for that URL to work.
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6ImY3MmxpdUQzb1dSUUp0NlFVUlI1blJVaUItX1pUX2JJMGRhY0ZOc3B6MDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi14OG52ZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjAwZDQ1ZTI1LWU5MmMtNDMxZC1iYjE2LWQ3ZWViZTRkNDgzMiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.Ty9vSYU59_uqBar9OSc0wlenhGm1-aSUCPb8SZf6nhE8VVSt4TPCrXeL2SEsI_u6JAEeOBVJvVof52XSoU84RM8-e3ZWmr57LfjlEh5tPyJXPijCR_x3K0fXV-vpUUV69s7PHoLIy8UaoXOGbxm0O_731fnMenNtNbDDiWXjW9mXhklUG9mxDEipfKW76B_ZmuEkYuAP6BiNPuYc1K6x3m5x4QpkLe3MhBi0tTCkG5q1RU8S63FE3deRcl7VVvGoCENPq9vMJpOEqsVDBotEGwOca4UG7cCOeSSHwOz2aLHeP0CXZehWp9d7GggnrnknKJHrtXZ7-WiIABSPe2GJow
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
c6a8f24c0c9fb11a0931cdd0bb43eb37d329e74c
Quick Start Instructions to Get Rolling with OpenStack
0
26
182
2020-04-28T17:01:51Z
Weiler
3
Created page with "__TOC__ ==Request an OpenStack Account== Once you have PRISM/GI VPN access, you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.e..."
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have PRISM/GI VPN access, you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
6879850617b770beee357134558c3a599529671e
183
182
2020-04-28T17:04:12Z
Weiler
3
/* Request an OpenStack Account */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [[PRISM/GI VPN access]], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
9c8593c792ea5c84588c1a8cf57c44c4babb9d3c
184
183
2020-04-28T17:05:49Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
4f9f3a2d6667c8d81b959500b7ef51ee26577520
185
184
2020-04-28T17:57:12Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
10cdb2c1db957d1673269688464ccf74ae1a3942
186
185
2020-04-28T19:35:23Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''', and then run the 'ssh-keygen' command. It will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
57c93e55a9a0894d0059716b9d47accd4e72138f
187
186
2020-04-28T19:37:28Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
0bfb0daf5874fe68f89f4f4611d761d9a4fb9f56
188
187
2020-04-28T19:37:46Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
43b0128101d2dfc6cdc68091a9ba1d6aa1e752cb
189
188
2020-04-28T20:30:35Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
fa424ea1a21d3ae9d8f7315ad18e355e672b4b9e
191
189
2020-04-28T20:32:40Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png]]
5ac3b048cf4e52d10e581bb57443e9e9466e650f
192
191
2020-04-28T20:34:31Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|upright]]
4a3d97db0d6e994285eabac2d25ac46ad15cbe09
193
192
2020-04-28T20:35:17Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|upright=1.0]]
7f0c906a3eec4ab2810b6cc4e362ff742740a7b7
194
193
2020-04-28T20:36:48Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|upright=400px]]
44dbc1910266336d835fd5735ea2d59b36bbc454
195
194
2020-04-28T20:37:04Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|400px]]
67ba0f79d5922449fbe817380c07e3e12d05549b
196
195
2020-04-28T20:37:26Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|800px]]
2334fb48cac5d4d18104cc1d68a6035517e69f33
197
196
2020-04-28T20:44:18Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
c7b0a31ba919c1f6f3e1331251f38da00eaf6109
198
197
2020-04-28T20:44:53Z
Weiler
3
/* Log In To giCloud */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
c568c3b4530f58abf4740bb4f5261f28a9daa898
199
198
2020-04-28T20:48:43Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select
b9c441682954e2f8ffc0636f04b31da2a4a66ae4
200
199
2020-04-28T21:01:37Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group, as seen in the image below:
4cda5ac4c1a2edc52d3eb63df4d23a2e0991c999
201
200
2020-04-28T21:14:53Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window.
b2e608fe1c0e93dfc1886c35c26ceb33039ae6c8
File:Keypairs.png
6
27
190
2020-04-28T20:31:50Z
Weiler
3
keypairs.png
wikitext
text/x-wiki
keypairs.png
acb448a9211058dbaa0565c92046ecfceeffd023
File:Launch.png
6
28
202
2020-04-28T21:16:45Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Quick Start Instructions to Get Rolling with OpenStack
0
26
203
201
2020-04-28T21:17:58Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
f3e78d34a31b7f20a435a08bfacbf5f0a62654ad
204
203
2020-04-28T22:02:15Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Intenret access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps takemn to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
4348b654cf5869ea105a91e25bb8573b679bad74
205
204
2020-04-28T22:18:06Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Intenret access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
9e80c9c5b7daed006c65042c585f0df40ca002ce
206
205
2020-04-28T22:58:08Z
Weiler
3
/* Connect to Your New Instance */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
1cc0bffcc1a63866f27372b9abe03e58960881a1
207
206
2020-04-28T23:02:48Z
Weiler
3
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
https://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
83fa723d9cafa430c63ae598c764df1f05cca083
210
207
2020-04-29T20:16:10Z
Weiler
3
/* Log In To giCloud */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
9af0682c3a53f22ad51048917b3938299d63dfa0
215
210
2020-07-16T23:23:49Z
Jgarcia
2
/* Launch a New Instance */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
d9aabcc96feb7a6fa0b39bf767ae4a02691dea10
AWS Account List and Numbers
0
22
208
173
2020-04-29T18:18:40Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-sc : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
290f863f9ab3da7844ddef314d1f57ddc7c621a2
209
208
2020-04-29T18:18:53Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-sc : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
94219ac9ad718602062795b525d03c74e88a40b6
211
209
2020-05-05T21:06:23Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-sc : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca : 542754589326
0647d35ce5ce64b6ffd6a2a169cf95a30a92f243
212
211
2020-05-29T16:42:45Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-pr : 542754589326
197beaced8eeac0bc0955c0e8a571ff52420f402
213
212
2020-05-29T16:43:04Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
9d9cd8036d7fb039d4b40d412c699250f0fad3b0
240
213
2021-10-19T18:14:14Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
31007067160da253f5ed94c015812853501c5aa2
Computational Genomics Kubernetes Installation
0
23
214
180
2020-06-23T23:23:05Z
Anovak
4
Suggest --no-progress and talk about AWS secrets
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster. If you omit the "requests" section altogether, the limit values will be used, so if you use only one, use "limits".
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==Using Amazon S3==
To use S3 from a Kubernetes pod, the pod needs to have the "aws" command installed, and it needs to have the ~/.aws/credentials file, with the credentials granting access, mounted over from a secret. Depending on your namespace, credentials may already be available in a "shared-s3-credentials" secret. If not, you can make a file called "credentials", populate it, and use "kubectl create secret generic secret-name --from-file credentials" to make an appropriate secret. Be sure to use AWS credentials that don't require assuming a role or MFA authentication!
Here's a minimal example job YAML that demonstrates using S3.
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli
aws s3 cp --no-progress s3://bucket/bigfile .
volumeMounts:
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
When copying files to and from S3, it is good to use the "--no-progress" option to "aws s3 cp". It's not clever enough to notice that it isn't talking to a real terminal and suppress its progress bar, and the large amount of bytes it sends to draw the progress bar can make it more difficult to inspect logs with k9s or "kubectl get logs".
==Inlining Jobs in Shell and Shell in Jobs==
When interactively developing on Kubernetes, it can be useful to be able to have a shell command you can copy and paste to run a Kubernetes job, rather than having to create YAML files on disk. Similarly, it can be useful to have shell scripting inline in your Kubernetes job definitions, rather than having to bake your experimental script into a Docker container. Here's an example that does both, putting the YAML inside a heredoc and putting the script to run in the container inside a multiline YAML string. We precede this with a command to delete the job, so you can modify your script and re-paste it to replace a failed or failing job. We also make sure to mount the AWS credentials in the container, so that the ''aws'' command will be able to access S3 if you install it.
kubectl delete job username-job
kubectl apply -f - <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli cowsay
cowsay "Listing files"
aws s3 ls s3://vg-k8s/
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
EOF
Make sure to replace "username-job" with a unique job name that includes ''your'' username.
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi0ycDY4cCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE5ZGI2Y2I0LWMyYWYtNDM3My04ZmM2LWE4YWYwYTBmNGRkNCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.ZVQjryG_ksfvReIfq4Frb6M4sE6OVDOXnFy9Aii-h3mrpdHRE6bgjdAvSGZ0jJSIUEz5GgPBQ0lCwhyZocivHHr4zTrNxMkOFZhPDnpvF6RVIDWTkqmH9Dg6qmro0gTJP75oKBpt7dFN2pW4zvqOAzqPmh7qxfoVusN8X6U13YirMFEf65-aGL-_FFNBsEzvjkC-BgXWbtk3YZc8CJL7xtvlKLyE6u6jC9Qx0SWnwzkALlxmzo_yYTDKpIrWiQGEqzLQOxKml-H0kSYLDX-t4sTivXp4vCw_ruoqwIpLnnQAC7q3ZtSTxHIrxbB7n_M8gfhpXtwprbPav-XmBk1xaQ
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
==Profiling with Perf==
You can use Linux's "perf" to profile your code on the Kubernetes cluster. Here is an example of a job that does so. You need to obtain a "perf" binary that matches the version of the kernel that the Kubernetes ''hosts'' are running, which most likely does not correspond to any version of "perf" available in the Ubuntu repositories. Here we download a binary previously uploaded to S3. Also, the Kubernetes hosts have '''Non-Uniform Memory Access (NUMA)''': some physical memory is "closer" to some physical cores than to toher physical cores. The system is divided into '''NUMA nodes''', each containing some cores and some memory. Memory access from a node to its own memory is significantly faster than memory access from a node to other nodes' memory. For consistent profiling, it is important to restrict your application to a single NUMA node if possible, with "numactl", so that all accesses are local to the NUMA node. If you don't do this, your application performance will vary arbitrarily depending on whether and when threads are scheduled on the different NUMA nodes of the system.
apiVersion: batch/v1
kind: Job
metadata:
name: username-profiling
spec:
ttlSecondsAfterFinished: 1000
template:
metadata: # Apply a lable saying that we use NUMA node 0
labels:
usesnuma0: "Yes"
spec:
affinity: # Say that we should not schedule on the same node as any other pod with that label
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: usesnuma0
operator: In
values:
- "Yes"
topologyKey: "kubernetes.io/hostname"
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli numactl
# Use this particular perf binary that matches the hosts' kernels
# If it is missing or outdated, get a new one from Erich or cluster-admin
aws s3 cp --no-progress s3://vg-k8s/users/adamnovak/projects/test/perf /usr/bin/perf
chmod +x /usr/bin/perf
# Do your work with perf here.
# Use numactl to limit your code to NUMA node 0 for consistent memory access times
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 24 # One NUMA node on our machines is 24 cores.
memory: "150Gi"
ephemeral-storage: "400Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
13f00f3638539758390fb1c28d887974b7982004
217
214
2020-09-08T20:59:10Z
Weiler
3
/* View the Cluster's Current Activity */
wikitext
text/x-wiki
__TOC__
The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes three worker nodes, each with the following specs:
* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface
==Getting Authorized to Connect==
If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:
cluster-admin@soe.ucsc.edu
Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.
==Authenticating to Kubernetes==
We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.
To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:
https://cg-kube-auth.gi.ucsc.edu
Once you authenticate (via username/password and 2-factor auth for CruzID Gold), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". '''If you see any errors in red,''' but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.
Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your '''"namespace:"''' line as directed. We will let you know which namespace to use.
==Testing Connectivity==
Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.
A quick test should go as follows:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k1.kube Ready <none> 13h v1.15.3
k2.kube Ready <none> 13h v1.15.3
k3.kube Ready <none> 13h v1.15.3
master.kube Ready master 13h v1.15.3
==Running Pods and Jobs with Requests and Limits==
When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise your pods will get stuck with the default limits which are tiny (to protect against runaway pods). You should always have an idea of how much resources will be consumed by your jobs, and not use much more than that, in order not to hog all the resources. It also prevents your job from "running away" unexpectedly and chewing up more resources than expected.
Here is a good example of a job file that specifies limits:
job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: $USER-$TS
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 30
template:
spec:
containers:
- name: magic
image: robcurrie/ubuntu
imagePullPolicy: Always
resources:
requests:
cpu: "1"
memory: "2G"
ephemeral-storage: "2G"
limits:
cpu: "1"
memory: "2G"
ephemeral-storage: "3G"
command: ["/bin/bash", "-c"]
args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
restartPolicy: Never
priorityClassName: medium-priority
Please note that the "request" and "limit" item fields should be the same. You would think that you could set the limit higher than the request, but in reality they need to match in order for the pod to stay within the kubernetes resource limit bubble. If you set the limit higher than the request, then you are risking the pod using more memory than the scheduler expects, and the node can start killing off random other colocated pods by way of OOM, which is very, very bad for the cluster. If you omit the "requests" section altogether, the limit values will be used, so if you use only one, use "limits".
Also note the "priorityClassName" line. Available values are:
high-priority
medium-priority
low-priority
That affects how quickly your jobs move up the queue in the event there are a lot of queued jobs. Always use "medium-priority" as the default unless you specifically know you need it higher or lower. Higher priority jobs will always go in front of lower priority jobs.
'''NOTE:''' Jobs and pods that have '''completed''' over 72 hours ago but have not been cleaned up will be automatically removed by the garbage collector. Most jobs will have the "ttlSecondsAfterFinished" configuration item in them, so they will automatically cleaned up after that time expires, but leaving the old pods and jobs around pins the disk space they were using while they remain, so it's good to get rid of them as soon as they are done unless you are debugging a failure or something like that.
Jobs that '''run''' over 72 hours will not be deleted, only the ones that have '''exited''' over 72 hours ago.
A lot of other good information can be viewed on Rob Currie's github page, which includes examples and some "How To" documentation:
https://github.com/rcurrie/kubernetes
==Using Amazon S3==
To use S3 from a Kubernetes pod, the pod needs to have the "aws" command installed, and it needs to have the ~/.aws/credentials file, with the credentials granting access, mounted over from a secret. Depending on your namespace, credentials may already be available in a "shared-s3-credentials" secret. If not, you can make a file called "credentials", populate it, and use "kubectl create secret generic secret-name --from-file credentials" to make an appropriate secret. Be sure to use AWS credentials that don't require assuming a role or MFA authentication!
Here's a minimal example job YAML that demonstrates using S3.
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli
aws s3 cp --no-progress s3://bucket/bigfile .
volumeMounts:
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
When copying files to and from S3, it is good to use the "--no-progress" option to "aws s3 cp". It's not clever enough to notice that it isn't talking to a real terminal and suppress its progress bar, and the large amount of bytes it sends to draw the progress bar can make it more difficult to inspect logs with k9s or "kubectl get logs".
==Inlining Jobs in Shell and Shell in Jobs==
When interactively developing on Kubernetes, it can be useful to be able to have a shell command you can copy and paste to run a Kubernetes job, rather than having to create YAML files on disk. Similarly, it can be useful to have shell scripting inline in your Kubernetes job definitions, rather than having to bake your experimental script into a Docker container. Here's an example that does both, putting the YAML inside a heredoc and putting the script to run in the container inside a multiline YAML string. We precede this with a command to delete the job, so you can modify your script and re-paste it to replace a failed or failing job. We also make sure to mount the AWS credentials in the container, so that the ''aws'' command will be able to access S3 if you install it.
kubectl delete job username-job
kubectl apply -f - <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: username-job
spec:
ttlSecondsAfterFinished: 1000
template:
spec:
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli cowsay
cowsay "Listing files"
aws s3 ls s3://vg-k8s/
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 1
memory: "4Gi"
ephemeral-storage: "10Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
EOF
Make sure to replace "username-job" with a unique job name that includes ''your'' username.
==View the Cluster's Current Activity==
One quick way to check the cluster's utilization is to do:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k1.kube 1815m 1% 1191Mi 0%
k2.kube 51837m 53% 46507Mi 12%
k3.kube 1458m 1% 61270Mi 15%
master.kube 111m 5% 1024Mi 46%
That means the worker nodes, k1, k2 and k3, are using minimal memory, k2 is using 52% CPU but lots of room still open for new jobs. Ignore the master node as that one only handles cluster management and doesn't run jobs or pods for users.
Another good way to get a lot of details about the current state of the cluster is through the Kubernetes Dashboard:
https://cgl-k8s-dashboard.gi.ucsc.edu/
Select the "token" login method, and paste in this (long) token:
eyJhbGciOiJSUzI1NiIsImtpZCI6InhhcTVJLWdkXzMzZzAxUENCdjNBYUJBbkZfZlBwSG9lVmd4S1dZbWZ6TncifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi1zNGxkbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImYwNWU4NjYyLWUyY2QtNDY3Yy1hYjY3LTNjNDc4ODVjZmM4YSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.pQQ3iaWgWGr4CVSxl0wvI9R0AqtxEMm_ElisnfejJtKb9g5ki7goL4VQ9n4lY1b0hO7ojfYcZzWC466FHLULPac6r_zRvme2YMi9EwyHU4iYfUVOktmcLPGl-NS_D3k-USJF8npqbn1OFSHS25pJ5924LFAC0dCkukanNODyNgbetplgkl8geG1pR_1dgqamJCB2xwDn2FjQBC-QjtUJnarGqeo1gqG3eeeWAImK3lGLnkYGPcsvwowmtOdjj2ScqCfjqlfkxWymMGAOB-iB7hEruYZ6dD4hrpIGuVSGQCHojm4FJo_AiFgRjBmfHZiRi0PV1PNoLQLRplpXMf2jOg
The dashboard is read-only, so you won't be able to edit anything, it's mostly for seeing what's going on and where.
You can also take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:
https://ganglia.gi.ucsc.edu/
That website requires a username and password:
username: genecats
password: KiloKluster
That's mostly for keeping the scrip kiddies and bots from banging on it.
Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.
==Profiling with Perf==
You can use Linux's "perf" to profile your code on the Kubernetes cluster. Here is an example of a job that does so. You need to obtain a "perf" binary that matches the version of the kernel that the Kubernetes ''hosts'' are running, which most likely does not correspond to any version of "perf" available in the Ubuntu repositories. Here we download a binary previously uploaded to S3. Also, the Kubernetes hosts have '''Non-Uniform Memory Access (NUMA)''': some physical memory is "closer" to some physical cores than to toher physical cores. The system is divided into '''NUMA nodes''', each containing some cores and some memory. Memory access from a node to its own memory is significantly faster than memory access from a node to other nodes' memory. For consistent profiling, it is important to restrict your application to a single NUMA node if possible, with "numactl", so that all accesses are local to the NUMA node. If you don't do this, your application performance will vary arbitrarily depending on whether and when threads are scheduled on the different NUMA nodes of the system.
apiVersion: batch/v1
kind: Job
metadata:
name: username-profiling
spec:
ttlSecondsAfterFinished: 1000
template:
metadata: # Apply a lable saying that we use NUMA node 0
labels:
usesnuma0: "Yes"
spec:
affinity: # Say that we should not schedule on the same node as any other pod with that label
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: usesnuma0
operator: In
values:
- "Yes"
topologyKey: "kubernetes.io/hostname"
containers:
- name: main
imagePullPolicy: Always
image: ubuntu:18.04
command:
- /bin/bash
- -c
- |
set -e
DEBIAN_FRONTEND=noninteractive apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y awscli numactl
# Use this particular perf binary that matches the hosts' kernels
# If it is missing or outdated, get a new one from Erich or cluster-admin
aws s3 cp --no-progress s3://vg-k8s/users/adamnovak/projects/test/perf /usr/bin/perf
chmod +x /usr/bin/perf
# Do your work with perf here.
# Use numactl to limit your code to NUMA node 0 for consistent memory access times
volumeMounts:
- mountPath: /tmp
name: scratch-volume
- mountPath: /root/.aws
name: s3-credentials
resources:
limits:
cpu: 24 # One NUMA node on our machines is 24 cores.
memory: "150Gi"
ephemeral-storage: "400Gi"
restartPolicy: Never
volumes:
- name: scratch-volume
emptyDir: {}
- name: s3-credentials
secret:
secretName: shared-s3-credentials
backoffLimit: 0
33dee01456042bcd9561c630cf5f3814fead4353
Genomics Institute Computing Information
0
6
216
181
2020-07-21T16:51:29Z
Jgarcia
2
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
b715fc1bb613fec414edd9e2a569ff4561e54a61
230
216
2021-02-01T19:49:15Z
Weiler
3
/* Amazon Web Services Account Management */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Account Management ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
ac263ca76c4916f34a81701aac298968888bc0fc
236
230
2021-08-23T18:23:34Z
Weiler
3
/* Amazon Web Services Account Management */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
0cc2ffe880f4aa84ded9477ab945d4b661293284
Requirement for users to get GI VPN access
0
9
218
166
2020-10-05T18:58:54Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce to request access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
c001438af3e71d8edf58dbe2b8496249e5529665
224
218
2020-12-09T18:52:43Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
to request access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
7358d7b74ac2da1bfa1eb02e4ab8c57847fd3700
225
224
2020-12-09T18:59:56Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields and attach all three required documents. Please see the links and instructions below.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
to request access. There are several requirements to gaining access to the firewalled area - please complete all these requirements '''BEFORE''' coming to have the VPN software set up for your laptop.
Please use this checklist to make sure that you have completed all '''six''' requirements.
'''1'''. User info, your PI info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed Genomics Institute VPN User Agreement
'''4'''. Signed NIH Genomic Data Sharing Policy Agreement
'''5'''. "eduroam" wireless network has setup on your laptop
'''6'''. Installed the appropriate OpenVPN software on your laptop
'''1''': You are required to ask your PI or sponsor to email cluster-admin@soe.ucsc.edu requesting a VPN account for you - this email should include:
Your name
Your PI's name
Your requested username (If your name is Jane Doe, then your username could be 'jdoe' for example).
PI's approval for this access
What other access you need such as a UNIX server account or access to OpenStack.
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''4''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
'''5''': You will need access to the "eduroam" wireless network '''prior''' to your appointment. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''6''': Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The appointment can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
221d18393fe6aad89b9b4b6d566254a3738b40e6
226
225
2020-12-09T19:04:35Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields and attach all three required documents. Please see the links and instructions below.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
'''1''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
963ebea357ac6540c743adff94464b9afafd8021
227
226
2020-12-09T19:05:43Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields and attach all three required documents. Please see the links and instructions below.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
Here are the links to the required documents.
'''1''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2018 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
e36d96e020d451b30d2208c7773cb7348009f9ec
244
227
2021-11-15T20:18:44Z
Haifang
1
wikitext
text/x-wiki
If you need VPN access to the Genomics Institute firewalled/secure area (aka the "Prism" or "CIRM" Environment), please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields and attach all three required documents. Please see the links and instructions below.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
Here are the links to the required documents.
'''1''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it when you come to your appointment to install the VPN software. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2021 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement and bring it with you to your VPN software installation appointment, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. Just staple the pages together. Please bring the signed document to your appointment. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
5dfab1a4ce6abd64efd548f0e60e1dd3a3ccec17
Overview of Getting and Using an AWS IAM Account
0
21
219
176
2020-10-26T16:23:42Z
Anovak
4
/* API Access and Secret Keys */ Show how you have to have the config for Toil to work
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is bill@ucsc.edu, for example:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"bill@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, bill@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to bill@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
=== Entering Base Credentials ===
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. There are a few ways you could set it up.
=== Adjusting Configuration for Toil or a Single Role ===
If you usually use a single role for a single project, or if you need to use Toil with a particular role, you should configure it like this, so that that role is automatically assumed for every operation by default:
[default]
region = us-west-2
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to use the aws command without any profile specified, and have it automatically assume a role to grant you access:
$ aws s3 ls
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
=== Adjusting Configuration for Multiple Roles ===
If you have multiple roles that you use equally often, and you don't need to use Toil, you can configure it something like this, with multiple profiles:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
bb63fef1d577393f44c9a7be8a01f986f0f6587f
220
219
2020-10-26T16:29:52Z
Anovak
4
I changed the example email in one section so now I have to change it everywhere
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is melinda@ucsc.edu, for example:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles. Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to melinda@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
=== Entering Base Credentials ===
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. There are a few ways you could set it up.
=== Adjusting Configuration for Toil or a Single Role ===
If you usually use a single role for a single project, or if you need to use Toil with a particular role, you should configure it like this, so that that role is automatically assumed for every operation by default:
[default]
region = us-west-2
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to use the aws command without any profile specified, and have it automatically assume a role to grant you access:
$ aws s3 ls
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
=== Adjusting Configuration for Multiple Roles ===
If you have multiple roles that you use equally often, and you don't need to use Toil, you can configure it something like this, with multiple profiles:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
bef45748d9ee7bbb38385e857e1763bb7885d11c
How to access the public servers
0
11
221
138
2020-12-01T19:15:54Z
Haifang
1
/* How to Gain Access to the Public Genomics Institute Compute Servers */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce requesting that you be granted access. Then we can create your account and go over the details via a short zoom meeting.
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
be0444fc1df697f4d35c65f94cb63aa8f38d1cdd
222
221
2020-12-09T18:35:32Z
Haifang
1
/* How to Gain Access to the Public Genomics Institute Compute Servers */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a201cd4b6de029f4a29c356a580a1f193aab8385
223
222
2020-12-09T18:36:28Z
Haifang
1
/* How to Gain Access to the Public Genomics Institute Compute Servers */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
459cbc4c92b7f3e0b9ff4cee8173bb587bac372c
241
223
2021-10-28T17:39:13Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
5c7c480084007d3be82a3a151e908a90a66bc580
243
241
2021-11-10T18:34:09Z
Haifang
1
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
804d28ed49cfe254fd8a8c3545013b70105dac1c
Requirements for dbGaP Access
0
19
228
104
2021-01-25T19:06:23Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let the GI Grants Team know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''GI Grants Team (gi-grant.team@ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to gi-grant.team@ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
We will correspond with you via email on when the appointment will be - please email the GI Grants Team about getting everything set up! ('''gi-grant.team@ucsc.edu''')
9cfe29ace321de95df784138ada5b8e182a7110b
229
228
2021-01-25T19:07:02Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let the GI Grants Team know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''GI Grants Team (gi-grant.team@ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to gi-grant.team@ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
51cb896bb7a8b94c1937faf3ecdacef7ea201d64
235
229
2021-03-24T17:08:14Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Haifang Telc (haifang@ucsc.edu) know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''Haifang Telc (haifang@ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to haifang@ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
73402a813a8c220576091438bcfbf01e0d9f142d
AWS Shared Bucket Usage Graphs
0
29
231
2021-02-01T19:52:02Z
Weiler
3
Created page with "On this page are listed some pie charts indicating the usage breakdown of "shared" buckets on AWS, so as to get an idea of where the data is being used. The buckets are divid..."
wikitext
text/x-wiki
On this page are listed some pie charts indicating the usage breakdown of "shared" buckets on AWS, so as to get an idea of where the data is being used. The buckets are divided up by account.
[[vg-dev]]
[vg-data]http://logserv.gi.ucsc.edu/cgi-bin/vg-data.cgi?cmd=index&path=/s3fs/vg-data
bd311707388d1215b5fea6e67a3d83aa8f60f9da
232
231
2021-02-01T19:54:29Z
Weiler
3
wikitext
text/x-wiki
On this page are listed some pie charts indicating the usage breakdown of "shared" buckets on AWS, so as to get an idea of where the data is being used. The buckets are divided up by account.
<u>vg-dev</u>
[http://logserv.gi.ucsc.edu/cgi-bin/vg-data.cgi?cmd=index&path=/s3fs/vg-data vg-data]
4ed79699e53357a3214ee936800f902455d3dffc
233
232
2021-02-01T19:55:46Z
Weiler
3
wikitext
text/x-wiki
On this page are listed some pie charts indicating the usage breakdown of "shared" buckets on AWS, so as to get an idea of where the data is being used. The buckets are divided up by account.
<u>
== vg-dev ==
</u>
[http://logserv.gi.ucsc.edu/cgi-bin/vg-data.cgi?cmd=index&path=/s3fs/vg-data vg-data]
02c9cb5e3827840657ae03c136f8a02c34c4cdaa
234
233
2021-02-01T19:56:16Z
Weiler
3
wikitext
text/x-wiki
On this page are listed some pie charts indicating the usage breakdown of "shared" buckets on AWS, so as to get an idea of where the data is being used. The buckets are divided up by account.
<u>
== vg-dev Buckets ==
</u>
[http://logserv.gi.ucsc.edu/cgi-bin/vg-data.cgi?cmd=index&path=/s3fs/vg-data vg-data]
5203f62ebb8c2babcde5dc6381b7b28bcf5ddcef
AWS Best Practices
0
30
237
2021-08-23T19:13:15Z
Weiler
3
Created page with "When using AWS, there are a few things to keep in mind in order to keep costs down: '''EC2''' [[Instances:]] When using instances, always pick and instance type that just q..."
wikitext
text/x-wiki
When using AWS, there are a few things to keep in mind in order to keep costs down:
'''EC2'''
[[Instances:]] When using instances, always pick and instance type that just qualifies for what you need, nothing much larger, otherwise it's wasted CPU time which costs more. Also, shut down your instance as soon as you no longer actively need it, as instances that are shut down do not accrue cost.
[[EBS Volumes:]] hen you create an EBS volume, the volume accrues cost whether or not you have it attached to an instance, and whether or not it actually has data in it. Do not use EBS volumes for long term storage when possible, as EBS volumes are much more expensive per GB than S3 for storage.
Also, if you need to spawn many instances for a short period of time, always try to use AWS "spot" instances to do the work when possible. Usually it takes a little waiting in order to successfully take advantage of spot instances, but the cost is 1/4 to 1/20 the cost of an on-demand instance.
'''S3'''
Remember that storing data in S3 costs money based on the amount of time the data spends in S3. If you don't plan on using the data in the near term, consider moving it to Glacier or Deep Glacier in order to save money on the storage. You can always pull the data back to regular S3 later if needed.
'''Tagging'''
It is extremely important to tag *every single resource* you use in AWS with the tag key "Owner", with the value being your IAM login name (i.e. your email address). Many EC2 and S3 resources will actually be deleted if not properly tagged, so make sure you do it as soon as you create a resource. Any resource, such as a Lambda, Elastic LoadBalancer, etc can be tagged. This is so when cleanup time comes, we can see who owns what and ask around about the status of things in general.
99d0ae16c386d7f531810dcf911ef423bf90694e
238
237
2021-08-23T19:13:52Z
Weiler
3
wikitext
text/x-wiki
When using AWS, there are a few things to keep in mind in order to keep costs down:
'''EC2'''
[[Instances:]] When using instances, always pick and instance type that just qualifies for what you need, nothing much larger, otherwise it's wasted CPU time which costs more. Also, shut down your instance as soon as you no longer actively need it, as instances that are shut down do not accrue cost.
[[EBS Volumes:]] hen you create an EBS volume, the volume accrues cost whether or not you have it attached to an instance, and whether or not it actually has data in it. Do not use EBS volumes for long term storage when possible, as EBS volumes are much more expensive per GB than S3 for storage.
Also, if you need to spawn many instances for a short period of time, always try to use AWS "spot" instances to do the work when possible. Usually it takes a little waiting in order to successfully take advantage of spot instances, but the cost is 1/4 to 1/20 the cost of an on-demand instance.
'''S3'''
Remember that storing data in S3 costs money based on the amount of time the data spends in S3. If you don't plan on using the data in the near term, consider moving it to Glacier or Deep Glacier in order to save money on the storage. You can always pull the data back to regular S3 later if needed.
'''Tagging'''
It is extremely important to tag ''every single resource'' you use in AWS with the tag key "Owner", with the value being your IAM login name (i.e. your email address). Many EC2 and S3 resources will actually be deleted if not properly tagged, so make sure you do it as soon as you create a resource. Any resource, such as a Lambda, Elastic LoadBalancer, etc can be tagged. This is so when cleanup time comes, we can see who owns what and ask around about the status of things in general.
3447af70ad5567d0f79bfd7838463a6236b65c92
239
238
2021-08-23T19:15:06Z
Weiler
3
wikitext
text/x-wiki
When using AWS, there are a few things to keep in mind in order to keep costs down:
'''EC2'''
[[Instances:]] When using instances, always pick and instance type that just qualifies for what you need, nothing much larger, otherwise it's wasted CPU time which costs more. Also, shut down your instance as soon as you no longer actively need it, as instances that are shut down do not accrue cost.
[[EBS Volumes:]] When you create an EBS volume, the volume accrues cost whether or not you have it attached to an instance, and whether or not it actually has data in it. Do not use EBS volumes for long term storage when possible, as EBS volumes are much more expensive per GB than S3 for storage.
Also, if you need to spawn many instances for a short period of time, always try to use AWS "spot" instances to do the work when possible. Usually it takes a little waiting in order to successfully take advantage of spot instances, but the cost is 1/4 to 1/20 the cost of an on-demand instance.
'''S3'''
Remember that storing data in S3 costs money based on the amount of time the data spends in S3. If you don't plan on using the data in the near term, consider moving it to Glacier or Deep Glacier in order to save money on the storage. You can always pull the data back to regular S3 later if needed.
'''Tagging'''
It is extremely important to tag ''every single resource'' you use in AWS with the tag key "Owner", with the value being your IAM login name (i.e. your email address). Many EC2 and S3 resources will actually be deleted if not properly tagged, so make sure you do it as soon as you create a resource. Any resource, such as a Lambda, Elastic LoadBalancer, etc can be tagged. This is so when cleanup time comes, we can see who owns what and ask around about the status of things in general.
2f5a2b425838961ce7ef6841955521a9ea9fbaf3
Access to the Firewalled Compute Servers
0
17
242
139
2021-10-28T18:05:23Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space
These servers are running CentOS 7.5 Linux. They are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
fe83073a724923be6e17a8926ae63044d4b24f91
AWS S3 Lifecycle Management
0
31
245
2021-12-02T22:55:43Z
Righanse
5
Created page with "Test page"
wikitext
text/x-wiki
Test page
9dd22c5b755ad18afcfc0a30b91a6628948fcc77
246
245
2021-12-02T23:28:14Z
Righanse
5
wikitext
text/x-wiki
(This page is a work in progress. The policies defined below are still be adjusted, and are not actively deployed)
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were accessed. AWS S3 objects are typically stored in the Standard storage class, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage class maintain a much lower GB/month cost as compared to the Standard S3 storage class.
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to cheaper storage classes if they have not been accessed recently. Specifically, AWS S3 objects are transitioned as follows:
* After '''30''' days of inactivity, objects are transitioned to the '''Infrequent Access''' storage class.
* After '''180''' days of inactivity, objects are transitioned to the '''Glacier''' storage class.
8ed0a2a3003660ef34d3e7c6e69e666e5f420c01
247
246
2021-12-02T23:43:42Z
Righanse
5
wikitext
text/x-wiki
(This page is a work in progress. The policies defined below may still be adjusted, and are not actively deployed)
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were accessed. AWS S3 objects are typically stored in the Standard storage class, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage class maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to cheaper storage classes if they have not been accessed recently. Specifically, AWS S3 objects are transitioned as follows:
* After '''30''' days of inactivity, objects are transitioned to the '''Infrequent Access''' storage class.
* After '''180''' days of inactivity, objects are transitioned to the '''Glacier''' storage class.
==Object Recovery==
In case there are objects in S3 that have been transitioned out of the Standard storage class, they can be recovered. Due to the increased access charges of objects in Infrequent Access and Glacier, if an object is expected to be accessed frequently, it should be returned to the Standard storage class. The object recovery method depends on the storage class the object is in, with Glacier being more time consuming and challenging to restore from than Infrequent Access.
It is important to note that, in the event an object is recovered, the timer for transitioning the object back to Infrequent Access and Glacier is still running, and the objects will be moved again in the future if they meet the criteria.
===Infrequent Access===
===Glacier===
09637d2e05210ecc3ef2e959ac49107d164aae05
248
247
2021-12-09T16:43:15Z
Righanse
5
wikitext
text/x-wiki
(This page is a work in progress. The policies defined below may still be adjusted, and are not actively deployed)
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were accessed. AWS S3 objects are typically stored in the Standard storage class, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent Tiering, which monitors S3 object access patterns and transitions objects accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent Tiering '''does not''' change object access patterns. This means you can still treat the object as if it were in the Standard storage class.
* Intelligent Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
1c260585db1fcd6fa3661f1dd5eff683be11da43
249
248
2021-12-09T17:03:07Z
Righanse
5
wikitext
text/x-wiki
(This page is a work in progress. The policies defined below may still be adjusted, and are not actively deployed)
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were accessed. AWS S3 objects are typically stored in the Standard storage class, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent Tiering, which monitors S3 object access patterns and transitions objects accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
947588956c9fcde8c485449587c769b4967d3119
250
249
2021-12-09T17:18:28Z
Righanse
5
wikitext
text/x-wiki
(This page is a work in progress. The policies defined below may still be adjusted, and are not actively deployed)
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
2340fd051043d68ae67a255ca6ead028701a561f
251
250
2021-12-09T17:19:48Z
Righanse
5
wikitext
text/x-wiki
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
71814a5d00158e08b021f0708d285cff936d1590
Genomics Institute Computing Information
0
6
252
236
2021-12-09T17:21:05Z
Righanse
5
/* Amazon Web Services Information */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
40e2665a2daedb908ccbe7393d615b8fd769395b
269
252
2023-03-09T01:18:45Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
4b75bf2c13f246eee824b0a8c2fc77bc21b547d0
271
269
2023-03-09T01:47:42Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
*[[Annotated Slurm Script]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
f065dbbfaa1e4eb9a40f2443fe95aa3b9c994d1f
273
271
2023-03-09T01:49:11Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
af56d8300e75c1341921011b36a122df5302ef1a
281
273
2023-03-09T03:24:22Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
40440b0f7318f6b5b7c518cb67879754d9f619fd
285
281
2023-03-09T03:32:25Z
Weiler
3
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[Quick Reference Guide]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
2250771d33c8a619743fc7c09e53d71dae09a3be
298
285
2023-05-02T05:20:04Z
Weiler
3
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
ca0a47ea430bc929bd22feda96fba442396e26fd
AWS S3 Lifecycle Management
0
31
253
251
2021-12-09T19:49:37Z
Righanse
5
wikitext
text/x-wiki
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent-Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent-Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent-Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent-Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
0d71db3c7b59b3772507e00268bc26c088804d53
254
253
2022-03-09T19:48:13Z
Anovak
4
Explain how to restore
wikitext
text/x-wiki
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent-Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent-Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent-Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent-Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
==Restoring Objects==
If an object has not been accessed for a while, you may encounter an error like this when trying to access it:
<code>
An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's access tier
</code>
This means that the object is in Glacier, either because somebody put it there, or because Intelligent-Tiering moved it there after it was not accessed for a while. If you want to access it, you will need to restore it (and our AWS account will be billed for doing so).
To restore an object, you can use the S3 section of the AWS web console.
You can also restore an object from the command line with the AWS CLI tool. To restore the object '''s3://bucket-name/path/to/object.dat''' and make it accessible for the next week, you would issue the command:
<code>
aws s3api restore-object --restore-request Days=7 --bucket "bucket-name" --key "path/to/object.dat"
</code>
Note that you need to specify the bucket name and key within the bucket separately, instead of using an S3 URI.
7b738552d74eb8f99f176ceaa407aa1d15e2134b
255
254
2022-03-09T19:58:16Z
Anovak
4
/* Restoring Objects */
wikitext
text/x-wiki
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent-Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent-Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent-Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent-Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
==Restoring Objects==
If an object has not been accessed for a while, you may encounter an error like this when trying to access it:
<code>
An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's access tier
</code>
This means that the object is in Glacier, either because somebody put it there, or because Intelligent-Tiering moved it there after it was not accessed for a while. If you want to access it, you will need to restore it (and our AWS account will be billed for doing so).
To restore an object, you can use the S3 section of the AWS web console.
You can also restore an object from the command line with the AWS CLI tool. To restore the object '''s3://bucket-name/path/to/object.dat''' and make it accessible for the next week, you would issue the command:
<code>
aws s3api restore-object --restore-request Days=7 --bucket "bucket-name" --key "path/to/object.dat"
</code>
Note that you need to specify the bucket name and key within the bucket separately, instead of using an S3 URI.
'''Restores from Glacier are not immediate, or even particularly fast.''' [http://vignette2.wikia.nocookie.net/starwars/images/0/0e/Citadel_data_vault.png/revision/latest?cb=20161220040411 Jyn Erso has to go down to the Scarif data vault and find the right data-tape], and it takes a few hours, even if your file is small.
744e0cef0523af0f9fa76f19d70e64660bd7bb3a
256
255
2022-03-10T20:59:42Z
Anovak
4
/* Restoring Objects */
wikitext
text/x-wiki
==AWS S3 Lifecycle Policy Overview==
AWS S3 buckets can be configured with lifecycle policies. These policies allow for automatically changing the storage class of objects based on the last time they were modified or accessed. AWS S3 objects are stored in the Standard storage class by default, which provides easy access, but has relatively high GB/month storage costs. Other storage classes, such as Infrequent Access and Glacier are more suitable for objects that are rarely accessed. These storage classes maintain a much lower GB/month cost as compared to the Standard S3 storage class, but also incur charges for access and retrieval.
It is recommended to utilize the appropriate storage classes for your data.
* If you have data that you do not expect to access more than once a '''month''', AWS Infrequent Access is a reasonable storage class to use.
* If you have data that you do not expect to access more than once a '''year''', AWS Glacier is a reasonable storage class to use.
==UCSC GI Automated Policy==
In order to reduce monthly S3 storage costs, the UCSC GI has implemented global S3 lifecycle policies that transition objects to AWS Intelligent-Tiering, which monitors S3 object access patterns and transitions objects to more efficient storage classes accordingly.
* Objects uploaded to S3 will remain in the Standard storage class for '''1''' day, at which point they will be transitioned to Intelligent-Tiering.
* Old and new S3 buckets will have this lifecycle policy automatically attached.
AWS Intelligent Tiering functionality:
* Intelligent-Tiering '''does not''' change object access patterns. This means you do not need to execute special API commands to access objects.
* Intelligent-Tiering '''does not''' incur charges for object retrieval from different tiers.
For more details on AWS Intelligent Tiering, see the [https://aws.amazon.com/s3/storage-classes/intelligent-tiering/ AWS Docs]
==Restoring Objects==
If an object has not been accessed for a while, you may encounter an error like this when trying to access it:
<code>
An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's access tier
</code>
This means that the object is in Glacier, either because somebody put it there, or because Intelligent-Tiering moved it there after it was not accessed for a while. If you want to access it, you will need to restore it (and our AWS account will be billed for doing so).
To restore an object, you can use the S3 section of the AWS web console.
You can also restore an object from the command line with the AWS CLI tool. To restore the object '''s3://bucket-name/path/to/object.dat''', you would issue the command:
<code>
aws s3api restore-object --restore-request "{}" --bucket "bucket-name" --key "path/to/object.dat"
</code>
If the object was manually put in Glacier, you would instead need <code>--restore-request "Days=7"</code>, or some other number of days.
Note that you need to specify the bucket name and key within the bucket separately, instead of using an S3 URI.
'''Restores from Glacier are not immediate, or even particularly fast.''' [http://vignette2.wikia.nocookie.net/starwars/images/0/0e/Citadel_data_vault.png/revision/latest?cb=20161220040411 Jyn Erso has to go down to the Scarif data vault and find the right data-tape], and it takes a few hours, even if your file is small.
f0a578cf36f62835edfbd9e56f9c4fa9b6227a1f
AWS Account List and Numbers
0
22
257
240
2022-07-08T18:57:34Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
d013f67525850d75972a235ef615d9935d64d6a4
261
257
2022-08-11T18:29:18Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
agc-runs : 598929688444
e3637c9d7e6e3c8bb5b6b2ac8a57a68d6c81947d
264
261
2023-01-12T19:13:23Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
agc-runs : 598929688444
a78e5444a1c9b46db00eaf5689d8220294eb92e6
266
264
2023-02-28T23:36:42Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
agc-runs : 598929688444
80a0c054ef7242e195e2035d3cfc2751089bad9e
Requirement for users to get GI VPN access
0
9
258
244
2022-07-18T21:33:47Z
Jgarcia
2
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2021 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
773d42867b8e02d4566f0650c1259201890dd55e
262
258
2022-11-17T00:08:10Z
Jgarcia
2
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/FYR/00_000.aspx
Click on the "2021 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
6e2ec837572a0365d694410a662172b6e5f16f65
263
262
2022-11-17T00:10:00Z
Jgarcia
2
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/FYR/00_000.aspx
The course is titled "2022 Information Security and Management Refresher". At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
b9d74e23b865cce076353c7742e495b2492fa50c
Overview of Getting and Using an AWS IAM Account
0
21
259
220
2022-07-22T23:53:16Z
Anovak
4
/* Switching Roles into Another AWS Account */ Note where to get the name.
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is melinda@ucsc.edu, for example:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles.
First, you need the name of the account you want to switch to. Select the name from the list at [AWS Account List and Numbers].
Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to melinda@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
=== Entering Base Credentials ===
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. There are a few ways you could set it up.
=== Adjusting Configuration for Toil or a Single Role ===
If you usually use a single role for a single project, or if you need to use Toil with a particular role, you should configure it like this, so that that role is automatically assumed for every operation by default:
[default]
region = us-west-2
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to use the aws command without any profile specified, and have it automatically assume a role to grant you access:
$ aws s3 ls
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
=== Adjusting Configuration for Multiple Roles ===
If you have multiple roles that you use equally often, and you don't need to use Toil, you can configure it something like this, with multiple profiles:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
609de9a14a96b0b114f4b8cb8ed582fff6302402
260
259
2022-07-22T23:53:41Z
Anovak
4
Fix link
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is melinda@ucsc.edu, for example:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"My Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Virtual MFA Device"'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles.
First, you need the name of the account you want to switch to. Select the name from the list at [[AWS Account List and Numbers]].
Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to melinda@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
=== Entering Base Credentials ===
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. There are a few ways you could set it up.
=== Adjusting Configuration for Toil or a Single Role ===
If you usually use a single role for a single project, or if you need to use Toil with a particular role, you should configure it like this, so that that role is automatically assumed for every operation by default:
[default]
region = us-west-2
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to use the aws command without any profile specified, and have it automatically assume a role to grant you access:
$ aws s3 ls
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
=== Adjusting Configuration for Multiple Roles ===
If you have multiple roles that you use equally often, and you don't need to use Toil, you can configure it something like this, with multiple profiles:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
b84b21296356bfda6b5e104ae5ecff53fed3b5c6
265
260
2023-01-18T19:53:12Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Getting AWS (Amazon Web Services) Access ==
The Genomics Institute has a series of AWS Accounts that all support different projects. Often if you become associated with one or more of those projects, you will need access to that account or accounts. The way we are managing AWS IAM Account Access is that we have one AWS account that is the 'top level' account that everyone gets access to, and then, once you log in there, you can "Switch Role" into another sub-account that you are running things in.
To get access, you will need your PI or Project Manager to email cluster-admin (cluster-admin@soe.ucsc.edu) asking for an AWS account for you, and also in that email to name the projects you will have access to. The cluster-admin group will contact you with your credentials to login. Once you login, you can change your password if you want to and also you will be able to set up MFA (Multi Factor Authentication) for your account. You will be required to use MFA in order to "Switch Role" into any of the sub-accounts for the projects you are working on.
The login URL to use when logging in to the top level account is listed below. The top level account is known as "gi-gateway":
https://gi-gateway.signin.aws.amazon.com/console
When you login, you '''may''' see a couple error messages on the AWS dashboard saying you don't have access to view certain resources - '''this is normal''', so just ignore the error messages.
== Configuring Account Credentials ==
Once you login to the gi-gateway, you will have very few permissions to do anything there - which is normal, since you will not be working in that account anyway. The gi-gateway account is just there to authenticate you to AWS.
'''Changing Your Password'''
You can change your password by clicking on your username on the top right of the web browser window, just to the right of the little bell. If your username is melinda@ucsc.edu, for example:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window.
* Click the '''"My Security Credentials"''' drop-down menu option.
* Click the '''"Change Password"''' button to change your password.
Note that we have a password strength policy in place, so your password must conform to the following requirements:
* Your password must be at least 10 characters long
* Your password must contain at least one lowercase letter
* Your password must contain at least one non-alphanumeric character
* Your password must contain at least one number
You will also need to configure MFA on your account before you will be allowed to switch roles into another account.
'''Configuring MFA'''
To configure MFA (Multi Factor Authentication), the most common way to do it is to use '''Google Authenticator''', which is an app available for Apple and Android based cell phones and mobile devices. The app is free, simply download it from the app store to your cell phone or tablet to get started. Other MFA apps may also work but we have not tested everything out there.
Once you have Google Authenticator installed, log into the gi-gateway account using the above URL, then:
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"Security Credentials"''' drop-down menu option.
* Scroll down to the MFA (Multi-Factor Authentication) section of the page, and click '''"Assign MFA Device"'''.
* In the following menu select '''"Authenticator App", and for the device name, use your username (which is your email address used to login)'''.
* In the following window click the '''"Show QR Code"''' link, and the MFA QR barcode will appear on your screen.
* Open the Google Authenticator app on your mobile device, and click the little "+" symbol in the top right corner
of the app to add an account.
* You will then need to select "Scan Barcode" in the Google Authenticaor app to continue, and aim your mobile device camera
at the QR barcode.
* The new account MFA device should then be set up and you should see a 6 digit number with a small timer to the right of it.
You must type in one 6 digit code that it displays into your web browser when asked, then wait for the next code to appear
after the timer expires, and type that into the second field. It should then inform you that you have successfully associated
an MFA device with your account.
Once you have associated an MFA device with the 'gi-gateway' Account, '''log out''', then log back in. It will ask for your username and password, and then ask for your MFA code, which you can view by opening Google Authenticator and seeing what code it is displaying at that time. The code changes every 30 seconds or so. '''You must log out first and log back in using MFA in order to be able to switch roles!!!'''
== Switching Roles into Another AWS Account ==
Now that you have configured a password and enabled MFA, you will be allowed to "Switch Roles" into another account such that you can begin work there. The first time you switch roles into an account it will ask you a few questions, but subsequently it will remember which roles you have access to and they will become a menu item you can click on to quickly switch roles.
First, you need the name of the account you want to switch to. Select the name from the list at [[AWS Account List and Numbers]].
Let's assume that you want to switch to the 'pangenomics' AWS account, and you have been already granted access to do so by the cluster-admin group. After logging into the 'gi-gateway' account at the URL listed here (same as above):
https://gi-gateway.signin.aws.amazon.com/console
Do the following to switch roles into the 'pangenomics' account (as an example):
* Click '''"melinda@ucsc.edu @ gi-gateway"''' on the top right of your browser window (again, melinda@ucsc.edu is an example).
* Click the '''"Switch Role"''' option in the drop-down menu.
* In the following menu it will ask you about the role you will be assuming. In our example we will use the following:
-Account* = pangenomics
-Role* = developer
-Display Name = [leave blank, or use a short phrase]
-Color = [choose a color for this role]
* Then click the "Switch Role" button.
If all went well you should be dumped into the 'pangenomics' account, and you should be identified in the top right hand corner of the page as '''"developer @ pangenomics"''', indicating your role and the account you are active in. You can then work as normal in that account. If you have not yet been given access to that role, you will receive an error message and not be allowed to switch roles.
'''NOTE:''' When you switch roles, it may dump you into a region that you don't expect it to. Always verify the region you are in by looking at the top right of the web page - it will display your region there. Most of our stuff exists in "Oregon" (us-west-2), but some items appear in other regions on a per-case basis.
If you wish to switch context back to the 'gi-gateway' account in order to manage something, or to switch to another role in another account, simple:
* Click '''"developer @ pangenomics"''' in the top right corner of the window.
* Select '''"Back to melinda@ucsc.edu"'''
You will then be sent back to the 'gi-gateway' context, and you can add another role to switch into, manage your credentials and further switch roles.
== API Access and Secret Keys ==
If you require programmatic access to AWS, you will very likely be familiar with the AWS concept of Access Keys and Secret Keys, which can be used by scripts to authenticate yourself to AWS and use the APIs there without using the web console to authenticate. In the past, access keys and secret keys could be used by users with no further authentication. This introduces a security risk, as the management of those keys must be carefully guarded - if anyone gets your keys, they can rack up charges on your AWS account without your knowledge!
Using the "Assume Role" mechanism we are now using, Access Keys and Secret Keys can still be created by users '''while logged into the gi-gateway account only'''. Do not try to create keys while you have "Switched Roles" into another account. Keys you create in the top level 'gi-gateway' account will work for you in any sub-account you have access to switch roles to. You will need to do a little more configuration for your keys to work from a UNIX command line however.
To set up your access and secret keys for the first time (again, logged into the 'gi-gateway' account only), follow these instructions. Once you log into the gi-gateway web interface, click on your username in the top right corner of the browser window, then click "My Security Credentials". In that screen you will see an "Access Keys" section, and you will have one key listed. Delete that key (using the "Delete" button on the right side of the key), then create a new key using the "Create Access Key" button. It will show you your access and secret key ONCE, so make sure to copy and paste it somewhere.
It should be noted that we recommend awscli version 1.16.187 or later, as earlier versions have documented issues with using profiles and MFA related actions. You can determine your version of awscli by doing:
aws --version
=== Entering Base Credentials ===
Generically, if you plan on using keys for API Access, minimally you will need to configure the "aws" utility and then tweak the config a bit for our setup. To start, run "aws configure". It should look something like this (put in your access and secret keys that you created in the previous step):
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]:
Most folks do that to start. It creates two files:
~/.aws/config
~/.aws/credentials
Those two files are important to access AWS via the 'aws' command.
'''~/.aws/credentials'''
This file contains your access key and secret key, and should not need to be modified after running 'aws configure'. Your same keys can be used to access any roles in any accounts you have access to.
'''~/.aws/config'''
This file contains some account information you will need to tweak. There are a few ways you could set it up.
=== Adjusting Configuration for Toil or a Single Role ===
If you usually use a single role for a single project, or if you need to use Toil with a particular role, you should configure it like this, so that that role is automatically assumed for every operation by default:
[default]
region = us-west-2
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
The "role_arn" line contains the role and account number you are accessing. You can see a list of live account numbers here:
[[AWS Account List and Numbers]]
Find the account number you need and enter it on the role_arn line, as well as the role name. You will get the role name from the cluster-admin group when you get access.
The 'mfa_serial' line contains the identifier for your MFA device. It will always look like '''"arn:aws:iam::652235167018:mfa/[your_iam_username]"'''. The account number listed there will always be "652235167018" because that is the account number of the top level "gi-gateway" account.
The "duration_seconds" parameter says that your session token will be 43200 seconds long (12 hours). That means you will only have to authenticate with MFA once every 12 hours. 12 hours is the maximum you can request, although you can specify less than that. This means it won't ask you for MFA every time you run a command for the next 12 hours.
Once that is configured, you should be able to use the aws command without any profile specified, and have it automatically assume a role to grant you access:
$ aws s3 ls
It will ask you for your MFA code and then run the command. Once you enter the MFA code, the token it creates will be valid for 12 hours if you specified "duration_seconds = 43200", or if you omitted that line, the default session duration is one hour, so you can run other 'aws' cli commands without the need to re-authenticate with MFA for the duration of the session. After the session expires, you will need to authenticate via MFA again.
=== Adjusting Configuration for Multiple Roles ===
If you have multiple roles that you use equally often, and you don't need to use Toil, you can configure it something like this, with multiple profiles:
[default]
region = us-west-2
[profile pangenomics-developer]
source_profile = default
role_arn = arn:aws:iam::422448306679:role/developer
mfa_serial = arn:aws:iam::652235167018:mfa/melinda@ucsc.edu
duration_seconds = 43200
Once that is configured, you should be able to reference the profile you just created when using the aws command, like so:
$ aws s3 ls --profile pangenomics-developer
==Tag Your Resources==
When you start using AWS resources (instances, networks, etc), it is very important that you "tag" your resources with the "Owner" tag (note the capital "O"). "Owner" is the key, and the value assigned to it will be your IAM username (i.e. your email address). So, for example, if I spin up an instance, I would tag it during or after creation with something like:
Owner = bob@ucsc.edu
If you do not tag your instances, '''they will automatically be terminated within 10 minutes.''' Tag your instances especially, but tag every resource you create! This allows us to perform accounting tasks much more easily and allows the Program Managers to know which resources are controlled by who.
ddfe7aea7d5afdf33b03ba920acac73f535e7510
How to access the public servers
0
11
267
243
2023-03-01T05:03:06Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.1
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
48cd5601c042ffaa6a7cac86796728e1aa287676
268
267
2023-03-01T05:04:40Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''plaza.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.1
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
3c224d836cec0f4fb14522ae9cbe7d13f160e2c4
Overview of using Slurm
0
32
270
2023-03-09T01:46:33Z
Weiler
3
Created page with "When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can ex..."
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Account:
#SBATCH --account=weiler
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
c5222c7d5b2434a4c0d7d425ad0c253b062619c3
274
270
2023-03-09T01:51:24Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Account:
#SBATCH --account=weiler
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
f559d6267c912315a72a5dab42dc631d7e5e73b6
278
274
2023-03-09T01:59:04Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
64b9a4293b6eae65b31c81ba8f438c47222b8ecc
279
278
2023-03-09T02:07:56Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
You could have a bunch of jobs, one script for each, in a directory:
/mydir/big_run/job-0.sh
/mydir/big_run/job-1.sh
/mydir/big_run/job-2.sh
and define them as such in your batch file:
#SBATCH --array=0-2
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
/mydir/big_run/job-$SLURM_ARRAY_TASK_ID.sh
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
032c7798fe1f93b6d3c5e4f0eb893bb898d685dd
280
279
2023-03-09T03:23:42Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
14bf4897c7d93e967f7b15f1c2b0adc7150c2cca
289
280
2023-03-09T03:57:44Z
Weiler
3
/* CGROUPS and Resource Management */
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the TimeThis is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
4e9b2374c1601ca96d496e41003d82605b4c068f
290
289
2023-03-09T03:58:46Z
Weiler
3
/* CGROUPS and Resource Management */
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
3d122dc3c9902a438c91edc3aff193bbcaa7aea7
291
290
2023-04-09T21:17:27Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
0e55e0a53717d938517b2a14214eae40646e883f
292
291
2023-04-09T21:19:31Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
4a694445139e7f54efee12a1f03e7707e7ae7d7a
293
292
2023-04-09T21:19:50Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
513be43c2adcbfc334bab414814b3af82aad09db
294
293
2023-04-09T21:21:18Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
9e1e3e8033d01ef959d2c15bde614164b5599bff
295
294
2023-04-09T21:24:36Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
0d924abde20ecde52cfdc15d304810b8ce1080cd
296
295
2023-04-09T21:26:42Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
acde99e28da56d222e2c3d2965cff9c352e8b71d
297
296
2023-05-02T03:12:06Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:A5500:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
52112e5a3fe89f96de5e280d77a844b3f0766d9e
300
297
2023-05-03T20:19:22Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
996ddc55eb9bc76474e3a86d5ba0058e92487887
301
300
2023-05-03T20:19:52Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
30a9b39b917061102cb017fd938e3f532b374dc8
Annotated Slurm Script
0
33
272
2023-03-09T01:48:13Z
Weiler
3
Created page with "[[Category:Scheduler]] This is a walk-through for a basic SLURM scheduler job script for a common case of a multi-threaded analysys. If the program you run is single-threaded..."
wikitext
text/x-wiki
[[Category:Scheduler]]
This is a walk-through for a basic SLURM scheduler job script for a common case of a multi-threaded analysys. If the program you run is single-threaded (can use only one CPU core) then only use '--ntasks=1' line for the cpu request instead of all three listed lines. Annotations are marked with bullet points. You can click on the link below to download the raw job script file without the annotation. Values in brackets are placeholders. You need to replace them with your own values. E.g. Change '<job name>' to something like 'blast_proj22'. We will write additional documentation on more complex job layouts for MPI jobs and other situations when a simple number of processor cores is not sufficient.
{|cellspacing=30
|-style="vertical-align:top;"
|style="width: 50%"|
;Set the shell to use
<pre>
#!/bin/bash
</pre>
;Common arguments
* Name the job to make it easier to see in the job queue
<pre>
#SBATCH --job-name=<JOBNAME>
</pre>
;Email
:Your email address to use for all batch system communications
<pre>
##SBATCH --mail-user=<EMAIL>
##SBATCH --mail-user=<EMAIL-ONE>,<EMAIL-TWO>
</pre>
;What emails to send
:NONE - no emails
:ALL - all emails
:END,FAIL - only email if the job fails and email the summary at the end of the job
<pre>
#SBATCH --mail-type=FAIL,END
</pre>
;Standard Output and Error log files
:Use file patterns
:: %j - job id
:: %A-%a - Array job id (A) and task id (a)
:: You can also use --error for a separate stderr log
<pre>
#SBATCH --output <my_job-%j.out>
</pre>
;Number of nodes to use. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --nodes=1
</pre>
;Number of tasks. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --ntasks=1
</pre>
;Number of CPU cores to use. This number must match the argument used for the program you run.
<pre>
#SBATCH --cpus-per-task=4
</pre>
||
;Total memory limit for the job. Default is 2 gigabytes, but units can be specified with mb or gb for Megabytes or Gigabytes.
<pre>
#SBATCH --mem=4gb
</pre>
;Job run time in [DAYS]:HOURS:MINUTES:SECONDS
:[DAYS] are optional, use when it is convenient
<pre>
#SBATCH --time=72:00:00
</pre>
;Optional:
:A group to use if you belong to multiple groups. Otherwise, do not use.
<pre>
#SBATCH --account=<GROUP>
</pre>
:A job array, which will create many jobs (called array tasks) different only in the '<code>$SLURM_ARRAY_TASK_ID</code>' variable, similar to [[Torque_Job_Arrays]] on HiPerGator 1
<pre>
#SBATCH --array=<BEGIN-END>
</pre>
;Example of five tasks
:<nowiki>#</nowiki>SBATCH --array=1-5
----
;Recommended convenient shell code to put into your job script
* Add host, time, and directory name for later troubleshooting
<pre>
date;hostname;pwd
</pre>
Below is the shell script part - the commands you will run to analyze your data. The following is an example.
* Load the software you need
<pre>
module load ncbi_blast
</pre>
* Run the program
<pre>
blastn -db nt -query input.fa -outfmt 6 -out results.xml --num_threads 4
date
</pre>
|}
94d1e9b77eaa1da471a51ad03d812d9c92abd6f0
275
272
2023-03-09T01:53:56Z
Weiler
3
wikitext
text/x-wiki
[[Category:Scheduler]]
This is a walk-through for a basic SLURM scheduler job script for a common case of a multi-threaded analysys. If the program you run is single-threaded (can use only one CPU core) then only use '--ntasks=1' line for the cpu request instead of all three listed lines. Annotations are marked with bullet points. You can click on the link below to download the raw job script file without the annotation. Values in brackets are placeholders. You need to replace them with your own values. E.g. Change '<job name>' to something like 'blast_proj22'. We will write additional documentation on more complex job layouts for MPI jobs and other situations when a simple number of processor cores is not sufficient.
{|cellspacing=30
|-style="vertical-align:top;"
|style="width: 50%"|
;Set the shell to use
<pre>
#!/bin/bash
</pre>
;Common arguments
* Name the job to make it easier to see in the job queue
<pre>
#SBATCH --job-name=<JOBNAME>
</pre>
;Email
:Your email address to use for all batch system communications
<pre>
##SBATCH --mail-user=<EMAIL>
##SBATCH --mail-user=<EMAIL-ONE>,<EMAIL-TWO>
</pre>
;GPUs
:How many GPUs your job will require
<pre>
#SBATCH --gres=gpu:1
</pre>
;What emails to send
:NONE - no emails
:ALL - all emails
:END,FAIL - only email if the job fails and email the summary at the end of the job
<pre>
#SBATCH --mail-type=FAIL,END
</pre>
;Standard Output and Error log files
:Use file patterns
:: %j - job id
:: %A-%a - Array job id (A) and task id (a)
:: You can also use --error for a separate stderr log
<pre>
#SBATCH --output <my_job-%j.out>
</pre>
;Number of nodes to use. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --nodes=1
</pre>
;Number of tasks. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --ntasks=1
</pre>
;Number of CPU cores to use. This number must match the argument used for the program you run.
<pre>
#SBATCH --cpus-per-task=4
</pre>
||
;Total memory limit for the job. Default is 2 gigabytes, but units can be specified with mb or gb for Megabytes or Gigabytes.
<pre>
#SBATCH --mem=4gb
</pre>
;Job run time in [DAYS]:HOURS:MINUTES:SECONDS
:[DAYS] are optional, use when it is convenient
<pre>
#SBATCH --time=72:00:00
</pre>
;Optional:
:A group to use if you belong to multiple groups. Otherwise, do not use.
<pre>
#SBATCH --account=<GROUP>
</pre>
:A job array, which will create many jobs (called array tasks) different only in the '<code>$SLURM_ARRAY_TASK_ID</code>' variable, similar to [[Torque_Job_Arrays]] on HiPerGator 1
<pre>
#SBATCH --array=<BEGIN-END>
</pre>
;Example of five tasks
:<nowiki>#</nowiki>SBATCH --array=1-5
----
;Recommended convenient shell code to put into your job script
* Add host, time, and directory name for later troubleshooting
<pre>
date;hostname;pwd
</pre>
Below is the shell script part - the commands you will run to analyze your data. The following is an example.
* Load the software you need
<pre>
module load ncbi_blast
</pre>
* Run the program
<pre>
blastn -db nt -query input.fa -outfmt 6 -out results.xml --num_threads 4
date
</pre>
|}
1bdc8b4bb79b1c39ee2b97ad37f42f2aee0e4eb3
276
275
2023-03-09T01:54:18Z
Weiler
3
wikitext
text/x-wiki
[[Category:Scheduler]]
This is a walk-through for a basic SLURM scheduler job script for a common case of a multi-threaded analysys. If the program you run is single-threaded (can use only one CPU core) then only use '--ntasks=1' line for the cpu request instead of all three listed lines. Annotations are marked with bullet points. You can click on the link below to download the raw job script file without the annotation. Values in brackets are placeholders. You need to replace them with your own values. E.g. Change '<job name>' to something like 'blast_proj22'. We will write additional documentation on more complex job layouts for MPI jobs and other situations when a simple number of processor cores is not sufficient.
{|cellspacing=30
|-style="vertical-align:top;"
|style="width: 50%"|
;Set the shell to use
<pre>
#!/bin/bash
</pre>
;Common arguments
* Name the job to make it easier to see in the job queue
<pre>
#SBATCH --job-name=<JOBNAME>
</pre>
;Email
:Your email address to use for all batch system communications
<pre>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-user=<EMAIL-ONE>,<EMAIL-TWO>
</pre>
;GPUs
:How many GPUs your job will require
<pre>
#SBATCH --gres=gpu:1
</pre>
;What emails to send
:NONE - no emails
:ALL - all emails
:END,FAIL - only email if the job fails and email the summary at the end of the job
<pre>
#SBATCH --mail-type=FAIL,END
</pre>
;Standard Output and Error log files
:Use file patterns
:: %j - job id
:: %A-%a - Array job id (A) and task id (a)
:: You can also use --error for a separate stderr log
<pre>
#SBATCH --output <my_job-%j.out>
</pre>
;Number of nodes to use. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --nodes=1
</pre>
;Number of tasks. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --ntasks=1
</pre>
;Number of CPU cores to use. This number must match the argument used for the program you run.
<pre>
#SBATCH --cpus-per-task=4
</pre>
||
;Total memory limit for the job. Default is 2 gigabytes, but units can be specified with mb or gb for Megabytes or Gigabytes.
<pre>
#SBATCH --mem=4gb
</pre>
;Job run time in [DAYS]:HOURS:MINUTES:SECONDS
:[DAYS] are optional, use when it is convenient
<pre>
#SBATCH --time=72:00:00
</pre>
;Optional:
:A group to use if you belong to multiple groups. Otherwise, do not use.
<pre>
#SBATCH --account=<GROUP>
</pre>
:A job array, which will create many jobs (called array tasks) different only in the '<code>$SLURM_ARRAY_TASK_ID</code>' variable, similar to [[Torque_Job_Arrays]] on HiPerGator 1
<pre>
#SBATCH --array=<BEGIN-END>
</pre>
;Example of five tasks
:<nowiki>#</nowiki>SBATCH --array=1-5
----
;Recommended convenient shell code to put into your job script
* Add host, time, and directory name for later troubleshooting
<pre>
date;hostname;pwd
</pre>
Below is the shell script part - the commands you will run to analyze your data. The following is an example.
* Load the software you need
<pre>
module load ncbi_blast
</pre>
* Run the program
<pre>
blastn -db nt -query input.fa -outfmt 6 -out results.xml --num_threads 4
date
</pre>
|}
4649ec628b8b0412e56ca395ecca384fbb0e320e
277
276
2023-03-09T01:55:03Z
Weiler
3
wikitext
text/x-wiki
[[Category:Scheduler]]
This is a walk-through for a basic SLURM scheduler job script for a common case of a multi-threaded analysis. If the program you run is single-threaded (can use only one CPU core) then only use '--ntasks=1' line for the cpu request instead of all three listed lines. Annotations are marked with bullet points. You can click on the link below to download the raw job script file without the annotation. Values in brackets are placeholders. You need to replace them with your own values. E.g. Change '<job name>' to something like 'blast_proj22'. We will write additional documentation on more complex job layouts for MPI jobs and other situations when a simple number of processor cores is not sufficient.
{|cellspacing=30
|-style="vertical-align:top;"
|style="width: 50%"|
;Set the shell to use
<pre>
#!/bin/bash
</pre>
;Common arguments
* Name the job to make it easier to see in the job queue
<pre>
#SBATCH --job-name=<JOBNAME>
</pre>
;Email
:Your email address to use for all batch system communications
<pre>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-user=<EMAIL-ONE>,<EMAIL-TWO>
</pre>
;GPUs
:How many GPUs your job will require
<pre>
#SBATCH --gres=gpu:1
</pre>
;What emails to send
:NONE - no emails
:ALL - all emails
:END,FAIL - only email if the job fails and email the summary at the end of the job
<pre>
#SBATCH --mail-type=FAIL,END
</pre>
;Standard Output and Error log files
:Use file patterns
:: %j - job id
:: %A-%a - Array job id (A) and task id (a)
:: You can also use --error for a separate stderr log
<pre>
#SBATCH --output <my_job-%j.out>
</pre>
;Number of nodes to use. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --nodes=1
</pre>
;Number of tasks. For all non-MPI jobs this number will be equal to '1'
<pre>
#SBATCH --ntasks=1
</pre>
;Number of CPU cores to use. This number must match the argument used for the program you run.
<pre>
#SBATCH --cpus-per-task=4
</pre>
||
;Total memory limit for the job. Default is 2 gigabytes, but units can be specified with mb or gb for Megabytes or Gigabytes.
<pre>
#SBATCH --mem=4gb
</pre>
;Job run time in [DAYS]:HOURS:MINUTES:SECONDS
:[DAYS] are optional, use when it is convenient
<pre>
#SBATCH --time=72:00:00
</pre>
;Optional:
:A group to use if you belong to multiple groups. Otherwise, do not use.
<pre>
#SBATCH --account=<GROUP>
</pre>
:A job array, which will create many jobs (called array tasks) different only in the '<code>$SLURM_ARRAY_TASK_ID</code>' variable, similar to [[Torque_Job_Arrays]] on HiPerGator 1
<pre>
#SBATCH --array=<BEGIN-END>
</pre>
;Example of five tasks
:<nowiki>#</nowiki>SBATCH --array=1-5
----
;Recommended convenient shell code to put into your job script
* Add host, time, and directory name for later troubleshooting
<pre>
date;hostname;pwd
</pre>
Below is the shell script part - the commands you will run to analyze your data. The following is an example.
* Load the software you need
<pre>
module load ncbi_blast
</pre>
* Run the program
<pre>
blastn -db nt -query input.fa -outfmt 6 -out results.xml --num_threads 4
date
</pre>
|}
33034a4756195ba0275d9831cea8bc23a7e7475b
Job Arrays
0
34
282
2023-03-09T03:28:42Z
Weiler
3
Created page with "== Job Array Support == == Overview == Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of t..."
wikitext
text/x-wiki
== Job Array Support ==
== Overview ==
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.), however it is possible to change some of these options after the job has begun execution using the scontrol command specifying the JobID of the array or individual ArrayJobID.
$ scontrol update job=101 ...
$ scontrol update job=101_1 ...
Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the sbatch command. The option argument can be specific array index values, a range of index values, and an optional step size as shown in the examples below. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one). Jobs which are part of a job array will have the environment variable SLURM_ARRAY_TASK_ID set to its array index value.
# Submit a job array with index values between 0 and 31
$ sbatch --array=0-31 -N1 tmp
# Submit a job array with index values of 1, 3, 5 and 7
$ sbatch --array=1,3,5,7 -N1 tmp
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
$ sbatch --array=1-7:2 -N1 tmp
A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
Job ID and Environment Variables
Job arrays will have two additional environment variable set. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_ID will be set to the job array index value. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. For example a job submission of this sort
sbatch --array=1-3 -N1 tmp
will generate a job array containing three jobs. If the sbatch command responds
Submitted batch job 36
then the environment variables will be set as follows:
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
All Slurm commands and APIs recognize the SLURM_JOB_ID value. Most commands also recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID values separated by an underscore as identifying an element of a job array. Using the example above, "37" or "36_2" would be equivalent ways to identify the second array element of job 36. A set of APIs has been developed to operate on an entire job array or select tasks of a job array in a single function call. The function response consists of an array identifying the various error codes for various tasks of a job ID. For example the job_resume2() function might return an array of error codes indicating that tasks 1 and 2 have already completed; tasks 3 through 5 are resumed successfully, and tasks 6 through 99 have not yet started.
File Names
Two additional options are available to specify a job's stdin, stdout, and stderr file names: %A will be replaced by the value of SLURM_ARRAY_JOB_ID (as defined above) and %a will be replaced by the value of SLURM_ARRAY_TASK_ID (as defined above). The default output file format for a job array is "slurm-%A_%a.out". An example of explicit use of the formatting is:
sbatch -o slurm-%A_%a.out --array=1-3 -N1 tmp
which would generate output files names of this sort "slurm-36_1.out", "slurm-36_2.out" and "slurm-36_3.out". If these file name options are used without being part of a job array then "%A" will be replaced by the current job ID and "%a" will be replaced by 4,294,967,294 (equivalent to 0xfffffffe or NO_VAL).
Scancel Command Use
If the job ID of a job array is specified as input to the scancel command then all elements of that job array will be cancelled. Alternately an array ID, optionally using regular expressions, may be specified for job cancellation.
# Cancel array ID 1 to 3 from job array 20
$ scancel 20_[1-3]
# Cancel array ID 4 and 5 from job array 20
$ scancel 20_4 20_5
# Cancel all elements from job array 20
$ scancel 20
# Cancel the current job or job array element (if job array)
if [[-z $SLURM_ARRAY_JOB_ID]]; then
scancel $SLURM_JOB_ID
else
scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
fi
== Squeue Command Use ==
When a job array is submitted to Slurm, only one job record is created. Additional job records will only be created when the state of a task in the job array changes, typically when a task is allocated resources or its state is modified using the scontrol command. By default, the squeue command will report all of the tasks associated with a single job record on one line and use a regular expression to indicate the "array_task_id" values as shown below.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1080_[5-1024] debug tmp mac PD 0:00 1 (Resources)
1080_1 debug tmp mac R 0:17 1 tux0
1080_2 debug tmp mac R 0:16 1 tux1
1080_3 debug tmp mac R 0:03 1 tux2
1080_4 debug tmp mac R 0:03 1 tux3
An option of "--array" or "-r" has also been added to the squeue command to print one job array element per line as shown below. The environment variable "SQUEUE_ARRAY" is equivalent to including the "--array" option on the squeue command line.
$ squeue -r
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1082_3 debug tmp mac PD 0:00 1 (Resources)
1082_4 debug tmp mac PD 0:00 1 (Priority)
1080 debug tmp mac R 0:17 1 tux0
1081 debug tmp mac R 0:16 1 tux1
1082_1 debug tmp mac R 0:03 1 tux2
1082_2 debug tmp mac R 0:03 1 tux3
The squeue --step/-s and --job/-j options can accept job or step specifications of the same format.
$ squeue -j 1234_2,1234_3
...
$ squeue -s 1234_2.0,1234_3.0
...
Two additional job output format field options have been added to squeue:
%F prints the array_job_id value
%K prints the array_task_id value
(all of the obvious letters to use were already assigned to other job fields).
Scontrol Command Use
Use of the scontrol show job option shows two new fields related to job array support. The JobID is a unique identifier for the job. The ArrayJobID is the JobID of the first element of the job array. The ArrayTaskID is the array index of this particular entry, either a single number of an expression identifying the entries represented by this job record (e.g. "5-1024"). Neither field is displayed if the job is not part of a job array. The optional job ID specified with the scontrol show job or scontrol show step commands can identify job array elements by specifying ArrayJobId and ArrayTaskId with an underscore between them (e.g. <ArrayJobID>_<ArrayTaskId>).
The scontrol command will operate on all elements of a job array if the job ID specified is ArrayJobID. Individual job array tasks can be modified using the ArrayJobID_ArrayTaskID as shown below.
$ sbatch --array=1-4 -J array ./sleepme 86400
Submitted batch job 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 0:13 1 dario
21845_2 canopo array david R 0:13 1 dario
21845_3 canopo array david R 0:13 1 dario
21845_4 canopo array david R 0:13 1 dario
$ scontrol update JobID=21845_2 name=arturo
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 17:03 1 dario
21845_2 canopo arturo david R 17:03 1 dario
21845_3 canopo array david R 17:03 1 dario
21845_4 canopo array david R 17:03 1 dario
The scontrol hold, holdu, release, requeue, requeuehold, suspend and resume commands can also either operate on all elements of a job array or individual elements as shown below.
$ scontrol suspend 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david S 25:12 1 dario
21845_2 canopo arturo david S 25:12 1 dario
21845_3 canopo array david S 25:12 1 dario
21845_4 canopo array david S 25:12 1 dario
$ scontrol resume 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol suspend 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david S 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol resume 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
== Job Dependencies ==
A job which is to be dependent upon an entire job array should specify itself dependent upon the ArrayJobID. Since each array element can have a different exit code, the interpretation of the afterok and afternotok clauses will be based upon the highest exit code from any task in the job array.
When a job dependency specifies the job ID of a job array:
The after clause is satisfied after all tasks in the job array start.
The afterany clause is satisfied after all tasks in the job array complete.
The aftercorr clause is satisfied after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
The afterok clause is satisfied after all tasks in the job array complete successfully.
The afternotok clause is satisfied after all tasks in the job array complete with at least one tasks not completing successfully.
Examples of use are shown below:
# Wait for specific job array elements
sbatch --depend=after:123_4 my.job
sbatch --depend=afterok:123_4:123_8 my.job2
# Wait for entire job array to complete
sbatch --depend=afterany:123 my.job
# Wait for corresponding job array elements
sbatch --depend=aftercorr:123 my.job
# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job
# Wait for entire job array to complete and at least one task fails
sbatch --depend=afternotok:123 my.job
== Other Command Use ==
The following Slurm commands do not currently recognize job arrays and their use requires the use of Slurm job IDs, which are unique for each array element: sbcast, sprio, sreport, sshare and sstat. The sacct, sattach and strigger commands have been modified to permit specification of either job IDs or job array elements. The sview command has been modified to permit display of a job's ArrayJobId and ArrayTaskId fields. Both fields are displayed with a value of "N/A" if the job is not part of a job array.
System Administration
A new configuration parameter has been added to control the maximum job array size: MaxArraySize. The smallest index that can be specified by a user is zero and the maximum index is MaxArraySize minus one. The default value of MaxArraySize is 1001. The maximum MaxArraySize supported in Slurm is 4000001. Be mindful about the value of MaxArraySize as job arrays offer an easy way for users to submit large numbers of jobs very quickly.
The sched/backfill plugin has been modified to improve performance with job arrays. Once one element of a job array is discovered to not be runnable or impact the scheduling of pending jobs, the remaining elements of that job array will be quickly skipped.
Slurm creates a single job record when a job array is submitted. Additional job records are only created as needed, typically when a task of a job array is started, which provides a very scalable mechanism to manage large job counts. Each task of the job array will share the same ArrayJobId but will have their own unique ArrayTaskId. In addition to the ArrayJobId, each job will have a unique JobId that gets assigned as the tasks are started.
417163ddf6bcc49c2f9e39549ed13b65f86ffbd9
283
282
2023-03-09T03:29:05Z
Weiler
3
wikitext
text/x-wiki
== Overview ==
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.), however it is possible to change some of these options after the job has begun execution using the scontrol command specifying the JobID of the array or individual ArrayJobID.
$ scontrol update job=101 ...
$ scontrol update job=101_1 ...
Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the sbatch command. The option argument can be specific array index values, a range of index values, and an optional step size as shown in the examples below. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one). Jobs which are part of a job array will have the environment variable SLURM_ARRAY_TASK_ID set to its array index value.
# Submit a job array with index values between 0 and 31
$ sbatch --array=0-31 -N1 tmp
# Submit a job array with index values of 1, 3, 5 and 7
$ sbatch --array=1,3,5,7 -N1 tmp
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
$ sbatch --array=1-7:2 -N1 tmp
A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
Job ID and Environment Variables
Job arrays will have two additional environment variable set. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_ID will be set to the job array index value. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. For example a job submission of this sort
sbatch --array=1-3 -N1 tmp
will generate a job array containing three jobs. If the sbatch command responds
Submitted batch job 36
then the environment variables will be set as follows:
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
All Slurm commands and APIs recognize the SLURM_JOB_ID value. Most commands also recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID values separated by an underscore as identifying an element of a job array. Using the example above, "37" or "36_2" would be equivalent ways to identify the second array element of job 36. A set of APIs has been developed to operate on an entire job array or select tasks of a job array in a single function call. The function response consists of an array identifying the various error codes for various tasks of a job ID. For example the job_resume2() function might return an array of error codes indicating that tasks 1 and 2 have already completed; tasks 3 through 5 are resumed successfully, and tasks 6 through 99 have not yet started.
File Names
Two additional options are available to specify a job's stdin, stdout, and stderr file names: %A will be replaced by the value of SLURM_ARRAY_JOB_ID (as defined above) and %a will be replaced by the value of SLURM_ARRAY_TASK_ID (as defined above). The default output file format for a job array is "slurm-%A_%a.out". An example of explicit use of the formatting is:
sbatch -o slurm-%A_%a.out --array=1-3 -N1 tmp
which would generate output files names of this sort "slurm-36_1.out", "slurm-36_2.out" and "slurm-36_3.out". If these file name options are used without being part of a job array then "%A" will be replaced by the current job ID and "%a" will be replaced by 4,294,967,294 (equivalent to 0xfffffffe or NO_VAL).
Scancel Command Use
If the job ID of a job array is specified as input to the scancel command then all elements of that job array will be cancelled. Alternately an array ID, optionally using regular expressions, may be specified for job cancellation.
# Cancel array ID 1 to 3 from job array 20
$ scancel 20_[1-3]
# Cancel array ID 4 and 5 from job array 20
$ scancel 20_4 20_5
# Cancel all elements from job array 20
$ scancel 20
# Cancel the current job or job array element (if job array)
if [[-z $SLURM_ARRAY_JOB_ID]]; then
scancel $SLURM_JOB_ID
else
scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
fi
== Squeue Command Use ==
When a job array is submitted to Slurm, only one job record is created. Additional job records will only be created when the state of a task in the job array changes, typically when a task is allocated resources or its state is modified using the scontrol command. By default, the squeue command will report all of the tasks associated with a single job record on one line and use a regular expression to indicate the "array_task_id" values as shown below.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1080_[5-1024] debug tmp mac PD 0:00 1 (Resources)
1080_1 debug tmp mac R 0:17 1 tux0
1080_2 debug tmp mac R 0:16 1 tux1
1080_3 debug tmp mac R 0:03 1 tux2
1080_4 debug tmp mac R 0:03 1 tux3
An option of "--array" or "-r" has also been added to the squeue command to print one job array element per line as shown below. The environment variable "SQUEUE_ARRAY" is equivalent to including the "--array" option on the squeue command line.
$ squeue -r
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1082_3 debug tmp mac PD 0:00 1 (Resources)
1082_4 debug tmp mac PD 0:00 1 (Priority)
1080 debug tmp mac R 0:17 1 tux0
1081 debug tmp mac R 0:16 1 tux1
1082_1 debug tmp mac R 0:03 1 tux2
1082_2 debug tmp mac R 0:03 1 tux3
The squeue --step/-s and --job/-j options can accept job or step specifications of the same format.
$ squeue -j 1234_2,1234_3
...
$ squeue -s 1234_2.0,1234_3.0
...
Two additional job output format field options have been added to squeue:
%F prints the array_job_id value
%K prints the array_task_id value
(all of the obvious letters to use were already assigned to other job fields).
Scontrol Command Use
Use of the scontrol show job option shows two new fields related to job array support. The JobID is a unique identifier for the job. The ArrayJobID is the JobID of the first element of the job array. The ArrayTaskID is the array index of this particular entry, either a single number of an expression identifying the entries represented by this job record (e.g. "5-1024"). Neither field is displayed if the job is not part of a job array. The optional job ID specified with the scontrol show job or scontrol show step commands can identify job array elements by specifying ArrayJobId and ArrayTaskId with an underscore between them (e.g. <ArrayJobID>_<ArrayTaskId>).
The scontrol command will operate on all elements of a job array if the job ID specified is ArrayJobID. Individual job array tasks can be modified using the ArrayJobID_ArrayTaskID as shown below.
$ sbatch --array=1-4 -J array ./sleepme 86400
Submitted batch job 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 0:13 1 dario
21845_2 canopo array david R 0:13 1 dario
21845_3 canopo array david R 0:13 1 dario
21845_4 canopo array david R 0:13 1 dario
$ scontrol update JobID=21845_2 name=arturo
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 17:03 1 dario
21845_2 canopo arturo david R 17:03 1 dario
21845_3 canopo array david R 17:03 1 dario
21845_4 canopo array david R 17:03 1 dario
The scontrol hold, holdu, release, requeue, requeuehold, suspend and resume commands can also either operate on all elements of a job array or individual elements as shown below.
$ scontrol suspend 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david S 25:12 1 dario
21845_2 canopo arturo david S 25:12 1 dario
21845_3 canopo array david S 25:12 1 dario
21845_4 canopo array david S 25:12 1 dario
$ scontrol resume 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol suspend 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david S 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol resume 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
== Job Dependencies ==
A job which is to be dependent upon an entire job array should specify itself dependent upon the ArrayJobID. Since each array element can have a different exit code, the interpretation of the afterok and afternotok clauses will be based upon the highest exit code from any task in the job array.
When a job dependency specifies the job ID of a job array:
The after clause is satisfied after all tasks in the job array start.
The afterany clause is satisfied after all tasks in the job array complete.
The aftercorr clause is satisfied after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
The afterok clause is satisfied after all tasks in the job array complete successfully.
The afternotok clause is satisfied after all tasks in the job array complete with at least one tasks not completing successfully.
Examples of use are shown below:
# Wait for specific job array elements
sbatch --depend=after:123_4 my.job
sbatch --depend=afterok:123_4:123_8 my.job2
# Wait for entire job array to complete
sbatch --depend=afterany:123 my.job
# Wait for corresponding job array elements
sbatch --depend=aftercorr:123 my.job
# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job
# Wait for entire job array to complete and at least one task fails
sbatch --depend=afternotok:123 my.job
== Other Command Use ==
The following Slurm commands do not currently recognize job arrays and their use requires the use of Slurm job IDs, which are unique for each array element: sbcast, sprio, sreport, sshare and sstat. The sacct, sattach and strigger commands have been modified to permit specification of either job IDs or job array elements. The sview command has been modified to permit display of a job's ArrayJobId and ArrayTaskId fields. Both fields are displayed with a value of "N/A" if the job is not part of a job array.
System Administration
A new configuration parameter has been added to control the maximum job array size: MaxArraySize. The smallest index that can be specified by a user is zero and the maximum index is MaxArraySize minus one. The default value of MaxArraySize is 1001. The maximum MaxArraySize supported in Slurm is 4000001. Be mindful about the value of MaxArraySize as job arrays offer an easy way for users to submit large numbers of jobs very quickly.
The sched/backfill plugin has been modified to improve performance with job arrays. Once one element of a job array is discovered to not be runnable or impact the scheduling of pending jobs, the remaining elements of that job array will be quickly skipped.
Slurm creates a single job record when a job array is submitted. Additional job records are only created as needed, typically when a task of a job array is started, which provides a very scalable mechanism to manage large job counts. Each task of the job array will share the same ArrayJobId but will have their own unique ArrayTaskId. In addition to the ArrayJobId, each job will have a unique JobId that gets assigned as the tasks are started.
2479d55cb3ad7ee3bb31e3015bfcdd54fd2a97fd
284
283
2023-03-09T03:29:55Z
Weiler
3
/* Squeue Command Use */
wikitext
text/x-wiki
== Overview ==
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.), however it is possible to change some of these options after the job has begun execution using the scontrol command specifying the JobID of the array or individual ArrayJobID.
$ scontrol update job=101 ...
$ scontrol update job=101_1 ...
Job arrays are only supported for batch jobs and the array index values are specified using the --array or -a option of the sbatch command. The option argument can be specific array index values, a range of index values, and an optional step size as shown in the examples below. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one). Jobs which are part of a job array will have the environment variable SLURM_ARRAY_TASK_ID set to its array index value.
# Submit a job array with index values between 0 and 31
$ sbatch --array=0-31 -N1 tmp
# Submit a job array with index values of 1, 3, 5 and 7
$ sbatch --array=1,3,5,7 -N1 tmp
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
$ sbatch --array=1-7:2 -N1 tmp
A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4.
Job ID and Environment Variables
Job arrays will have two additional environment variable set. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_ID will be set to the job array index value. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. For example a job submission of this sort
sbatch --array=1-3 -N1 tmp
will generate a job array containing three jobs. If the sbatch command responds
Submitted batch job 36
then the environment variables will be set as follows:
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
All Slurm commands and APIs recognize the SLURM_JOB_ID value. Most commands also recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID values separated by an underscore as identifying an element of a job array. Using the example above, "37" or "36_2" would be equivalent ways to identify the second array element of job 36. A set of APIs has been developed to operate on an entire job array or select tasks of a job array in a single function call. The function response consists of an array identifying the various error codes for various tasks of a job ID. For example the job_resume2() function might return an array of error codes indicating that tasks 1 and 2 have already completed; tasks 3 through 5 are resumed successfully, and tasks 6 through 99 have not yet started.
File Names
Two additional options are available to specify a job's stdin, stdout, and stderr file names: %A will be replaced by the value of SLURM_ARRAY_JOB_ID (as defined above) and %a will be replaced by the value of SLURM_ARRAY_TASK_ID (as defined above). The default output file format for a job array is "slurm-%A_%a.out". An example of explicit use of the formatting is:
sbatch -o slurm-%A_%a.out --array=1-3 -N1 tmp
which would generate output files names of this sort "slurm-36_1.out", "slurm-36_2.out" and "slurm-36_3.out". If these file name options are used without being part of a job array then "%A" will be replaced by the current job ID and "%a" will be replaced by 4,294,967,294 (equivalent to 0xfffffffe or NO_VAL).
Scancel Command Use
If the job ID of a job array is specified as input to the scancel command then all elements of that job array will be cancelled. Alternately an array ID, optionally using regular expressions, may be specified for job cancellation.
# Cancel array ID 1 to 3 from job array 20
$ scancel 20_[1-3]
# Cancel array ID 4 and 5 from job array 20
$ scancel 20_4 20_5
# Cancel all elements from job array 20
$ scancel 20
# Cancel the current job or job array element (if job array)
if [[-z $SLURM_ARRAY_JOB_ID]]; then
scancel $SLURM_JOB_ID
else
scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
fi
== Squeue Command Use ==
When a job array is submitted to Slurm, only one job record is created. Additional job records will only be created when the state of a task in the job array changes, typically when a task is allocated resources or its state is modified using the scontrol command. By default, the squeue command will report all of the tasks associated with a single job record on one line and use a regular expression to indicate the "array_task_id" values as shown below.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1080_[5-1024] debug tmp mac PD 0:00 1 (Resources)
1080_1 debug tmp mac R 0:17 1 tux0
1080_2 debug tmp mac R 0:16 1 tux1
1080_3 debug tmp mac R 0:03 1 tux2
1080_4 debug tmp mac R 0:03 1 tux3
An option of "--array" or "-r" has also been added to the squeue command to print one job array element per line as shown below. The environment variable "SQUEUE_ARRAY" is equivalent to including the "--array" option on the squeue command line.
$ squeue -r
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1082_3 debug tmp mac PD 0:00 1 (Resources)
1082_4 debug tmp mac PD 0:00 1 (Priority)
1080 debug tmp mac R 0:17 1 tux0
1081 debug tmp mac R 0:16 1 tux1
1082_1 debug tmp mac R 0:03 1 tux2
1082_2 debug tmp mac R 0:03 1 tux3
The squeue --step/-s and --job/-j options can accept job or step specifications of the same format.
$ squeue -j 1234_2,1234_3
...
$ squeue -s 1234_2.0,1234_3.0
...
Two additional job output format field options have been added to squeue:
%F prints the array_job_id value
%K prints the array_task_id value
(all of the obvious letters to use were already assigned to other job fields).
== Scontrol Command Use ==
Use of the scontrol show job option shows two new fields related to job array support. The JobID is a unique identifier for the job. The ArrayJobID is the JobID of the first element of the job array. The ArrayTaskID is the array index of this particular entry, either a single number of an expression identifying the entries represented by this job record (e.g. "5-1024"). Neither field is displayed if the job is not part of a job array. The optional job ID specified with the scontrol show job or scontrol show step commands can identify job array elements by specifying ArrayJobId and ArrayTaskId with an underscore between them (e.g. <ArrayJobID>_<ArrayTaskId>).
The scontrol command will operate on all elements of a job array if the job ID specified is ArrayJobID. Individual job array tasks can be modified using the ArrayJobID_ArrayTaskID as shown below.
$ sbatch --array=1-4 -J array ./sleepme 86400
Submitted batch job 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 0:13 1 dario
21845_2 canopo array david R 0:13 1 dario
21845_3 canopo array david R 0:13 1 dario
21845_4 canopo array david R 0:13 1 dario
$ scontrol update JobID=21845_2 name=arturo
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 17:03 1 dario
21845_2 canopo arturo david R 17:03 1 dario
21845_3 canopo array david R 17:03 1 dario
21845_4 canopo array david R 17:03 1 dario
The scontrol hold, holdu, release, requeue, requeuehold, suspend and resume commands can also either operate on all elements of a job array or individual elements as shown below.
$ scontrol suspend 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david S 25:12 1 dario
21845_2 canopo arturo david S 25:12 1 dario
21845_3 canopo array david S 25:12 1 dario
21845_4 canopo array david S 25:12 1 dario
$ scontrol resume 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol suspend 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david S 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol resume 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
== Job Dependencies ==
A job which is to be dependent upon an entire job array should specify itself dependent upon the ArrayJobID. Since each array element can have a different exit code, the interpretation of the afterok and afternotok clauses will be based upon the highest exit code from any task in the job array.
When a job dependency specifies the job ID of a job array:
The after clause is satisfied after all tasks in the job array start.
The afterany clause is satisfied after all tasks in the job array complete.
The aftercorr clause is satisfied after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
The afterok clause is satisfied after all tasks in the job array complete successfully.
The afternotok clause is satisfied after all tasks in the job array complete with at least one tasks not completing successfully.
Examples of use are shown below:
# Wait for specific job array elements
sbatch --depend=after:123_4 my.job
sbatch --depend=afterok:123_4:123_8 my.job2
# Wait for entire job array to complete
sbatch --depend=afterany:123 my.job
# Wait for corresponding job array elements
sbatch --depend=aftercorr:123 my.job
# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job
# Wait for entire job array to complete and at least one task fails
sbatch --depend=afternotok:123 my.job
== Other Command Use ==
The following Slurm commands do not currently recognize job arrays and their use requires the use of Slurm job IDs, which are unique for each array element: sbcast, sprio, sreport, sshare and sstat. The sacct, sattach and strigger commands have been modified to permit specification of either job IDs or job array elements. The sview command has been modified to permit display of a job's ArrayJobId and ArrayTaskId fields. Both fields are displayed with a value of "N/A" if the job is not part of a job array.
System Administration
A new configuration parameter has been added to control the maximum job array size: MaxArraySize. The smallest index that can be specified by a user is zero and the maximum index is MaxArraySize minus one. The default value of MaxArraySize is 1001. The maximum MaxArraySize supported in Slurm is 4000001. Be mindful about the value of MaxArraySize as job arrays offer an easy way for users to submit large numbers of jobs very quickly.
The sched/backfill plugin has been modified to improve performance with job arrays. Once one element of a job array is discovered to not be runnable or impact the scheduling of pending jobs, the remaining elements of that job array will be quickly skipped.
Slurm creates a single job record when a job array is submitted. Additional job records are only created as needed, typically when a task of a job array is started, which provides a very scalable mechanism to manage large job counts. Each task of the job array will share the same ArrayJobId but will have their own unique ArrayTaskId. In addition to the ArrayJobId, each job will have a unique JobId that gets assigned as the tasks are started.
218d0f2ac05979cb87b357beac108f1131dd1a0d
Quick Reference Guide
0
35
286
2023-03-09T03:39:31Z
Weiler
3
Created page with "== Job scheduling commands == {| class="wikitable" |- ! Commands ! Function ! Basic Usage ! Example |- ! sbatch ! submit a slurm job ! sbatch [script] ! $ sbatch job.sub |- !..."
wikitext
text/x-wiki
== Job scheduling commands ==
{| class="wikitable"
|-
! Commands
! Function
! Basic Usage
! Example
|-
! sbatch
! submit a slurm job
! sbatch [script]
! $ sbatch job.sub
|-
! scancel
! delete slurm batch job
! scancel [job_id]
! $ scancel 123456
|-
! scontrol hold
! hold slurm batch jobs
! scontrol hold [job_id]
! $ scontrol hold 123456
|-
! scontrol release
! release hold on slurm batch jobs
! scontrol release [job_id]
! $ scontrol release 123456
|}
== Job management commands ==
Job Status Commands
sinfo -a list all queues
squeue list all jobs
squeue -u userid list jobs for userid
squeue -t R list running jobs
smap show jobs, partitions and nodes in a graphical network topology
Job script basics
A typical job script will look like this:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out
cd $SLURM_SUBMIT_DIR
module load modulename
your_commands_goes_here
Lines starting with #SBATCH are for SLURM resource manager to request resources for HPC. Some important options are as follows:
{| class="wikitable"
|+ Caption: Batch File
|-
! Option
! Examples
! Description
|-
--nodes #SBATCH --nodes=1 Number of nodes
--cpus-per-task #SBATCH --cpus-per-task=16 Number of CPUs per node
--time #SBATCH --time=HH:MM:SS Total time requested for your job
--output #SBATCH -output filename STDOUT to a file
--error #SBATCH --error filename STDERR to a file
--mail-user #SBATCH --mail-user user@domain.edu Email address to send notifications
Interactive session
To start a interactive session execute the following:
1
2
3
#this command will give 1 Node for a time of 4 hours
srun -N 1 -t 4:00:00 --pty /bin/bash
Getting information on past jobs
You can use slurm database to see how much memory your previous jobs used, e.g. the following command will report requested memory and used residential and virtual memory for job
1
2
sacct -j <JOBID> --format JobID,Partition,Submit,Start,End,NodeList%40,ReqMem,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,ExitCode
Aliases that provide useful information parsed from the SLURM commands
Place these alias’ into your .bashrc
1
2
alias si="sinfo -o \"%20P %5D %14F %8z %10m %10d %11l %16f %N\""
alias sq="squeue -o \"%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R\""
0b4db933930557789f1a2f44a2fafeee27080ba3
287
286
2023-03-09T03:51:22Z
Weiler
3
wikitext
text/x-wiki
== General Commands ==
Get documentation on a command:
man <command>
Try the following commands:
man sbatch
man squeue
man scancel
== Submitting jobs ==
The following example script specifies a partition, time limit, memory allocation and number of cores. All your scripts should specify values for these four parameters. You can also set additional parameters as shown, such as jobname and output file. For This script performs a simple task — it generates of file of random numbers and then sorts it. A detailed explanation the script is available here.
#!/bin/bash
#
#SBATCH -p shared # partition (queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt
Now you can submit your job with the command:
sbatch myscript.sh
If you want to test your job and find out when your job is estimated to run use (note this does not actually submit the job):
sbatch --test-only myscript.sh
== Information on Jobs ==
List all current jobs for a user:
squeue -u <username>
List all running jobs for a user:
squeue -u <username> -t RUNNING
List all pending jobs for a user:
squeue -u <username> -t PENDING
List all current jobs in the shared partition for a user:
squeue -u <username> -p shared
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
== Controlling jobs ==
To cancel one job:
scancel <jobid>
To cancel all the jobs for a user:
scancel -u <username>
To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
To cancel one or more jobs by name:
scancel --name myJobName
To hold a particular job from being scheduled:
scontrol hold <jobid>
To release a particular job to be scheduled:
scontrol release <jobid>
To requeue (cancel and rerun) a particular job:
scontrol requeue <jobid>
== Job arrays and useful commands ==
As shown in the commands above, its easy to refer to one job by its Job ID, or to all your jobs via your username. What if you want to refer to a subset of your jobs? The answer is to submit your job set as a job array. Then you can use the job array ID to refer to the set when running SLURM commands.
== SLURM job arrays ==
To cancel an indexed job in a job array:
scancel <jobid>_<index>
e.g.
scancel 1234_4
To find the original submit time for your job array
sacct -j 32532756 -o submit -X --noheader | uniq
== Advanced (but useful!) Commands ==
The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)
Suspend all running jobs for a user (takes into account job arrays):
squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:
squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:
squeue -ho %A -u $USER -t S | wc -l
View Cluster State
shost
1aac19955dd7b887ca40d003976dca52388aaf7d
288
287
2023-03-09T03:52:20Z
Weiler
3
/* Advanced (but useful!) Commands */
wikitext
text/x-wiki
== General Commands ==
Get documentation on a command:
man <command>
Try the following commands:
man sbatch
man squeue
man scancel
== Submitting jobs ==
The following example script specifies a partition, time limit, memory allocation and number of cores. All your scripts should specify values for these four parameters. You can also set additional parameters as shown, such as jobname and output file. For This script performs a simple task — it generates of file of random numbers and then sorts it. A detailed explanation the script is available here.
#!/bin/bash
#
#SBATCH -p shared # partition (queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt
Now you can submit your job with the command:
sbatch myscript.sh
If you want to test your job and find out when your job is estimated to run use (note this does not actually submit the job):
sbatch --test-only myscript.sh
== Information on Jobs ==
List all current jobs for a user:
squeue -u <username>
List all running jobs for a user:
squeue -u <username> -t RUNNING
List all pending jobs for a user:
squeue -u <username> -t PENDING
List all current jobs in the shared partition for a user:
squeue -u <username> -p shared
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
== Controlling jobs ==
To cancel one job:
scancel <jobid>
To cancel all the jobs for a user:
scancel -u <username>
To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
To cancel one or more jobs by name:
scancel --name myJobName
To hold a particular job from being scheduled:
scontrol hold <jobid>
To release a particular job to be scheduled:
scontrol release <jobid>
To requeue (cancel and rerun) a particular job:
scontrol requeue <jobid>
== Job arrays and useful commands ==
As shown in the commands above, its easy to refer to one job by its Job ID, or to all your jobs via your username. What if you want to refer to a subset of your jobs? The answer is to submit your job set as a job array. Then you can use the job array ID to refer to the set when running SLURM commands.
== SLURM job arrays ==
To cancel an indexed job in a job array:
scancel <jobid>_<index>
e.g.
scancel 1234_4
To find the original submit time for your job array
sacct -j 32532756 -o submit -X --noheader | uniq
== Advanced (but useful!) Commands ==
The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)
Suspend all running jobs for a user (takes into account job arrays):
squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:
squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:
squeue -ho %A -u $USER -t S | wc -l
View Cluster State:
shost
76d656f29cff085775f0f8bb942767378d706fdf
GPU Resources
0
36
299
2023-05-02T05:28:39Z
Weiler
3
Created page with "When submitting jobs, you can ask for GPUs in one of two ways. One is: #SBATCH --gres=gpu:1 That will ask for 1 GPU generically on a node with a free GPU. This request is..."
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
e72da6a337984c1a4dd52bc867f4bd5cff15ff5d
Overview of using Slurm
0
32
302
301
2023-05-11T15:52:28Z
Anovak
4
Cross-link to quick reference
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
bc5ecf486ec10d065c3b6eaa768b363198c5e2cf
338
302
2023-06-12T22:11:37Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix.prism). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
86b10d1b962ac310edb6dc3f796e39697c65982a
Requirement for users to get GI VPN access
0
9
303
263
2023-05-15T00:42:51Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/FYR/00_005.aspx
The course is titled "2022 Information Security and Management Refresher". At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
29b1b6beac1cfcf48e6c4198712e0daabf64ed80
304
303
2023-05-15T00:45:52Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to print and sign the Genomics Institute VPN User Agreement, located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
b21b4bc8b67c6701a48dd95005d89f14383802f0
305
304
2023-05-15T00:57:13Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to print out or save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please print, read and sign the last page of the NIH Genomic Data Sharing Policy agreement, located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
15a9c26344392b722345de41bdd2a51e3d6600aa
306
305
2023-05-15T00:57:56Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism" or CIRM"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://tunnelblick.net/downloads.html. Select the Latest Stable version.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
35dc5b224e4a895a5f9d290cada460cc16566d09
GPU Resources
0
36
307
299
2023-05-15T18:44:00Z
Anovak
4
Explain how to actually use GPUs and what won't work
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Using GPUs==
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi, but they do not have the full CUDA Toolkit, and they do not have the nvcc CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
To actually use a GPU, you need to run a program that uses the CUDA API. Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA without needing a compiler.
You can also run containers on the cluster using Singularity, and give them access to GPUs using the --nv option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 /usr/bin/singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Unfortunately, the Docker daemon is not installed on the cluster nodes, and so Docker is not available. Instead, Singularity can run most Docker containers, without requiring users to be able to manipulate a highly-privileged daemon.
Slurm itself also supports a --container option for jobs, which allows a whole job to be run inside a container. If you are able to convert your container to OCI Bundle format, you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk, and the tools to download a Docker image from Docker Hub in OCI bundle format (skopeo and umoci) are not installed on the cluster.
401d8811c76dbb0b9cfc897f8c36d7a8970b433c
308
307
2023-05-15T18:52:52Z
Anovak
4
Divide up by method
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
You can also run containers on the cluster using Singularity, and give them access to GPUs using the --nv option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 /usr/bin/singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk, and the tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not installed on the cluster.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Unfortunately, the Docker daemon is not installed on the cluster nodes, and so Docker is not available. Instead, Singularity can run most Docker containers, without requiring users to be able to manipulate a highly-privileged daemon.
5e140883fff43dc43722bda4e06d390ba436bfc7
309
308
2023-05-15T18:54:53Z
Anovak
4
/* Containerized GPU Workloads */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 /usr/bin/singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk, and the tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not installed on the cluster.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Unfortunately, the Docker daemon is not installed on the cluster nodes, and so Docker is not available. Instead, Singularity can run most Docker containers, without requiring users to be able to manipulate a highly-privileged daemon.
fdd818aec649e853187067ab810d310deb0cb371
310
309
2023-05-15T19:07:16Z
Anovak
4
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk, and the tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not installed on the cluster.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Unfortunately, the Docker daemon is not installed on the cluster nodes, and so Docker is not available. Instead, Singularity can run most Docker containers, without requiring users to be able to manipulate a highly-privileged daemon.
5e2acad41af012d543c1fe2611f7d1b4c5da3615
311
310
2023-05-15T19:37:30Z
Anovak
4
/* Running Containers in Docker */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk, and the tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not installed on the cluster.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the `docker` command does not work for you, ask cluster-admin to add you to the right groups.
We are working on getting the nVidia runtime set up to make the '''--gpus''' option to '''docker run''' work.
f19a18b08884f72c5ae0c935fd1faaad93e90eee
312
311
2023-05-15T19:45:19Z
Anovak
4
/* Running Containers in Slurm */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the `docker` command does not work for you, ask cluster-admin to add you to the right groups.
We are working on getting the nVidia runtime set up to make the '''--gpus''' option to '''docker run''' work.
5e3f910977d3b6e4bb5cf1b3bb7ac442f02e833b
313
312
2023-05-15T19:55:47Z
Anovak
4
/* Running Containers in Docker */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
We are working on getting the nVidia runtime set up to make the '''--gpus''' option to '''docker run''' work.
390b8b7d950122a6dde08c541d077409df833ca1
321
313
2023-05-18T21:39:08Z
Anovak
4
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up, so you should be able to do:
srun -c 1 --mem 4G --gres=gpu:1 docker run --rm --runtime=nvidia --gpus=1 nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi
You shouldn't need the '''--runtime''' argument in normal operation, just '''--gpus'''.
c08d91a196e44e6d05858c8205b2933b2aef2609
322
321
2023-05-18T21:43:01Z
Anovak
4
/* Running Containers in Docker */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up, so you should be able to do:
srun -c 1 --mem 4G --gres=gpu:1 docker run --rm --runtime=nvidia --gpus=1 nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi
You shouldn't need the '''--runtime''' argument in normal operation, just '''--gpus'''.
Further testing is needed to determine if Slurm is able to assign individual GPUs to individual jobs in a way that Docker respects.
275c508838cc1ab38ce0066010301de17308ed67
323
322
2023-05-22T15:07:02Z
Anovak
4
/* Running Containers in Docker */ Note how to point Docker at the right GPUs.
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 --exclude phoenix-01 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLRUM_STEP_GPUS''' environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
18a00e086417a11057cc592ae9e8ea28a086c059
324
323
2023-05-22T15:07:48Z
Anovak
4
/* Running Containers in Singularity */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to GPUs using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLRUM_STEP_GPUS''' environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
b672735e7e45296fdeb3900d1ee2d172d0729ee3
325
324
2023-05-22T15:11:47Z
Anovak
4
/* Running Containers in Singularity */
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to the GPUs that Slurm has selected using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
Slurm's containment of the Slurm job to the correct set of GPUs is also passed through to the Singularity container; there is no need to specifically direct Singularity to use the right GPUs unless you are doing something unusual.
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLRUM_STEP_GPUS''' environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
b6c40aace5dddc1f64b38971cc6e64b6d9b3d2c2
326
325
2023-05-22T15:19:32Z
Anovak
4
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
For the most part, Slurm takes care of making sure that each job only sees and used the GPUs assigned to it. Within the job, '''CUDA_VISIBLE_DEVICES''' will be set in the environment, but it will always be set to a list of your requested number of GPUs, starting at 0. Slurm re-numbers the GPUs assigned to each job to appear to start at 0, within the job. If you need access to the "real" GPU numbers (to log or to pass along to Docker), they are available in the '''SLURM_STEP_GPUS''' environment variable.
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to the GPUs that Slurm has selected using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
Slurm's containment of the Slurm job to the correct set of GPUs is also passed through to the Singularity container; there is no need to specifically direct Singularity to use the right GPUs unless you are doing something unusual.
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLRUM_STEP_GPUS''' environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
eb5bb2a3585f4bad0338c3acbe4867f99aff485f
327
326
2023-05-22T19:32:57Z
Anovak
4
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
nVidia GeForce RTX 2080 Ti : 11GB RAM
nVidia GeForce RTX 1080 Ti : 11GB RAM
For the most part, Slurm takes care of making sure that each job only sees and used the GPUs assigned to it. Within the job, '''CUDA_VISIBLE_DEVICES''' will be set in the environment, but it will always be set to a list of your requested number of GPUs, starting at 0. Slurm re-numbers the GPUs assigned to each job to appear to start at 0, within the job. If you need access to the "real" GPU numbers (to log or to pass along to Docker), they are available in the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLURM_STEP_GPUS''' (for '''srun''') environment variable.
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to the GPUs that Slurm has selected using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
Slurm's containment of the Slurm job to the correct set of GPUs is also passed through to the Singularity container; there is no need to specifically direct Singularity to use the right GPUs unless you are doing something unusual.
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLRUM_STEP_GPUS''' (for '''srun''') environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
0e2f12f049a7eb71c72eeba94895e468dff7bb91
Quick Reference Guide
0
35
314
288
2023-05-15T19:59:29Z
Anovak
4
/* General Commands */
wikitext
text/x-wiki
== General Commands ==
Get documentation on a command:
man <command>
Try the following commands:
man sbatch
man squeue
man scancel
man srun
== Submitting jobs ==
The following example script specifies a partition, time limit, memory allocation and number of cores. All your scripts should specify values for these four parameters. You can also set additional parameters as shown, such as jobname and output file. For This script performs a simple task — it generates of file of random numbers and then sorts it. A detailed explanation the script is available here.
#!/bin/bash
#
#SBATCH -p shared # partition (queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt
Now you can submit your job with the command:
sbatch myscript.sh
If you want to test your job and find out when your job is estimated to run use (note this does not actually submit the job):
sbatch --test-only myscript.sh
== Information on Jobs ==
List all current jobs for a user:
squeue -u <username>
List all running jobs for a user:
squeue -u <username> -t RUNNING
List all pending jobs for a user:
squeue -u <username> -t PENDING
List all current jobs in the shared partition for a user:
squeue -u <username> -p shared
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
== Controlling jobs ==
To cancel one job:
scancel <jobid>
To cancel all the jobs for a user:
scancel -u <username>
To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
To cancel one or more jobs by name:
scancel --name myJobName
To hold a particular job from being scheduled:
scontrol hold <jobid>
To release a particular job to be scheduled:
scontrol release <jobid>
To requeue (cancel and rerun) a particular job:
scontrol requeue <jobid>
== Job arrays and useful commands ==
As shown in the commands above, its easy to refer to one job by its Job ID, or to all your jobs via your username. What if you want to refer to a subset of your jobs? The answer is to submit your job set as a job array. Then you can use the job array ID to refer to the set when running SLURM commands.
== SLURM job arrays ==
To cancel an indexed job in a job array:
scancel <jobid>_<index>
e.g.
scancel 1234_4
To find the original submit time for your job array
sacct -j 32532756 -o submit -X --noheader | uniq
== Advanced (but useful!) Commands ==
The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)
Suspend all running jobs for a user (takes into account job arrays):
squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:
squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:
squeue -ho %A -u $USER -t S | wc -l
View Cluster State:
shost
3e3bdc0ef3cba6b9eafdbb87b5ce0559f50c48ad
Genomics Institute Computing Information
0
6
315
298
2023-05-15T20:00:46Z
Anovak
4
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for VG]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
e9889c977eea429ee5ec73501a3fc72c2fa852b8
316
315
2023-05-15T20:00:58Z
Anovak
4
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
7b4c419ee0d7f71633b5e3b5f8e3d9ace0bb6de5
340
316
2023-06-14T22:24:21Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Environment Storage Overview]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
ad686f3283d1edef5addcbe22d4aadaee24606b7
346
340
2023-06-14T22:55:20Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
*[[Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
49d2750023c4c01c5009891997e3626abfb91f5c
Slurm Tips for vg
0
37
317
2023-05-15T21:33:26Z
Anovak
4
Created page with "This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster. 1. Make yourself a user directory under '''/private/g..."
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
1. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
2. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
3. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t dsa && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
4. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
5. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
6. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
a68363181a35394a19b196e1098ea28556e594ca
319
317
2023-05-15T21:40:42Z
Anovak
4
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to the cluster head node:
ssh phoenix.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t dsa && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* If you want an interactive session with appreciable resources, you can schedule one with '''srun'''. For example, to get 16 cores and 120G memory all for you, run:
srun -c 16 --mem 120G --pty bash -i
ca7518d2d41e3817d70e8f2c6011fbda8d036290
320
319
2023-05-15T21:43:24Z
Anovak
4
/* Misc Tips */
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to the cluster head node:
ssh phoenix.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t dsa && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* If you want an interactive session with appreciable resources, you can schedule one with '''srun'''. For example, to get 16 cores and 120G memory all for you, run:
srun -c 16 --mem 120G --pty bash -i
* To send out a job without making a script file for it, use '''sbatch --wrap "your command here"'''.
* You can use arguments from SBATCH lines on the command line!
* You can use [https://github.com/CLIP-HPC/SlurmCommander#readme Slurm Commander] to watch the state of the cluster with the '''scom''' command.
06be2d97ad3c4c8ec953960fd47432e22d1c3941
Slurm Tips for Toil
0
38
318
2023-05-15T21:34:40Z
Anovak
4
Created page with "Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like..."
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/running/wdl.rst#running-wdl-with-toil the Toil documentation on WDL workflows].
* Because the new WDL interpreter in Toil isn't yet in any release, you would want to install Toil from source with WDL support with:
pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
f88f3272699a4c4fc15fd09af40679cf761e1103
329
318
2023-06-05T21:29:47Z
Anovak
4
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/running/wdl.rst#running-wdl-with-toil the Toil documentation on WDL workflows].
* Because the new WDL interpreter in Toil isn't yet in any release, you would want to install Toil from source with WDL support with:
pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
3d0667981261dfffc315c727cb0a3d6d02d97c9f
AWS Account List and Numbers
0
22
328
266
2023-05-27T19:58:41Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
agc-runs : 598929688444
sequencing-center-cold-store : 436140841220
c2811e5d92919ef08d25644f74409ccd11e95920
Quick Start Instructions to Get Rolling with OpenStack
0
26
330
215
2023-06-06T15:06:27Z
Anovak
4
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
'''Your key must be an RSA key!''' The newer ED25519 keys '''do not work''' with our version of OpenStack.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova: and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
292e94bce6c3fa6906cdf6d9d7a557838f81957c
331
330
2023-06-06T15:09:29Z
Anovak
4
/* Launch a New Instance */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
'''Your key must be an RSA key!''' The newer ED25519 keys '''do not work''' with our version of OpenStack.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova" and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
2479bff28d9ffcd56d1c9d26d22a0966edf5f15e
Access to the Firewalled Compute Servers
0
17
332
242
2023-06-12T21:52:11Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on either or both of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
98cbf2ca3d135338aeb24d3e57d9b1ef2694536a
333
332
2023-06-12T21:52:43Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
d4a1b348245d89d389b026b695a2685b0751bdbd
334
333
2023-06-12T21:58:48Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== The Phoenix Cluster ==
This is a cluster of ~20 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is """phoenix.prism""". To learn more about how to use Slurm, refer to:
c0d69ad79b3cf2d8b1278b358442794c89612355
335
334
2023-06-12T22:00:29Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is """phoenix.prism""". To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
ffc8abc1d25c5f2acf930a91b4c7bf868729bf87
336
335
2023-06-12T22:01:15Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
557feb1ebed5a241e834b48c549e7a3fc6b79381
337
336
2023-06-12T22:03:12Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/private/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
c5e19abc732323407ce12fc0540c17a432c0c405
339
337
2023-06-14T22:23:30Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
2e1f7a995170e96da3d1f86896ffadc8cfd4fec1
Firewalled Environment Storage Overview
0
39
341
2023-06-14T22:37:46Z
Weiler
3
Created page with "== Server Types and Management== After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN: '''crimson.prism''': 256GB RAM,..."
wikitext
text/x-wiki
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
0376e579e59479ae6e98e8eaf8758e60929f98d6
342
341
2023-06-14T22:48:30Z
Weiler
3
wikitext
text/x-wiki
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Filesystem Specifications'''
{| class="wikitable"
|- style="font-weight:bold;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="text-align:center; font-weight:bold;" | Default Disk Space
|
|
|-
| style="text-align:center; font-weight:bold;" | Soft Quota
| 30 GB
| 15 TB
|-
| style="text-align:center; font-weight:bold;" | Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold;" | Total Capacity
| 19 TB
| 500 TB
|-
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|-
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
2a61e43a52e41c22dbeac689acfbd09036a45c4f
343
342
2023-06-14T22:50:59Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
40b8493a1a032bf80335339962d4ad7d7828d4fe
344
343
2023-06-14T22:52:02Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
357eef316ce52bb2f2a1ca0882805561fbb51f40
345
344
2023-06-14T22:54:50Z
Weiler
3
wikitext
text/x-wiki
== Server Types and Management==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
556008ab83e2de08a75ed89e0aac0fcaf88c7a5a
348
345
2023-06-14T22:56:19Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Storage ==
These servers mount two types of storage; home directories and group storage directories.
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a8ae851de4ed072cbbaa4c363b5e273311b767f3
Computing Resources Overview
0
40
347
2023-06-14T22:55:43Z
Weiler
3
Created page with "== Doing Work and Computing == When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many th..."
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
f4760b5e4dcd10c3a04d180b74f7af1c50d28c84
349
347
2023-06-14T22:57:02Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
'''crimson.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, CentOS 7.9
'''razzmatazz.prism''': 256GB RAM, 32 cores, 5.5TB local scratch space, Ubuntu 22.04
'''mustard.prism''': 1.5TB RAM, 160 cores, 9TB local scratch space, Ubuntu 22.04
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
0f7053a6338026220b2156b57fedb43a28b0d6f4
350
349
2023-06-14T23:03:58Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the compute servers behind the VPN:
{| class="wikitable"
|- style="font-weight:bold;"
! Server Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| mustard
| Ubuntu 22.04
| style="text-align:center;" | 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| crimson
| CentOS 7.5
| style="text-align:center;" | 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| razzmatazz
| Ubuntu 22.04
| style="text-align:center;" | 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
b83bc7e95b18b182acfa238dce8863e1a4ceacd5
351
350
2023-06-14T23:07:49Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN:
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
e81830a637aea40ba170002d7463fe0a6c7ec971
Computing Resources Overview
0
40
352
351
2023-06-14T23:17:24Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN:
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / nVidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / nVidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / nVidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / nVidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
4640cadc8eb15f88448b64e5ebccb861bfdac95c
Genomics Institute Computing Information
0
6
353
346
2023-06-14T23:19:30Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
==GI Public Computing Environment==
*[[How to access the public servers]]
==GI Firewalled Computing Environment (PRISM)==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
ab4a48885cea5a7b5207452e78cb0a5d7b6518ed
355
353
2023-06-14T23:21:40Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
bf78e1af113f197fa0af99ad4dba96ee404ef5de
362
355
2023-06-14T23:28:02Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
eeac2f6477f9616bc0c3f88556d8ccac50dec18f
366
362
2023-06-14T23:35:18Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
5f8d916b43a917e980fb97f5c6c8b944f15f8ce4
372
366
2023-06-27T21:06:49Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
170c1050868c541613fc683885be775ddacc7045
375
372
2023-06-28T21:46:31Z
Anovak
4
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
3de61c8046c86b3167d1e08b096ee86a46ca070c
Firewalled Computing Resources Overview
0
41
354
2023-06-14T23:19:38Z
Weiler
3
Created page with "== Doing Work and Computing == When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many th..."
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN:
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / nVidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / nVidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / nVidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / nVidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
4640cadc8eb15f88448b64e5ebccb861bfdac95c
368
354
2023-06-14T23:36:51Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / nVidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / nVidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / nVidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / nVidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
96f6f0dab1427db080142f59e09a96348328c81d
369
368
2023-06-14T23:58:11Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
00419de86b37dbb16ba9acede2ba449a7870f600
379
369
2023-06-29T01:54:29Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
471cc445b567b5864a5788f8bc7a3cf3079296a1
380
379
2023-06-29T01:54:42Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | CentOS 7.5
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
c69246e9ecd06a129a9880d3820b3b15dbd6131d
400
380
2023-07-28T18:17:04Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~20 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-20]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
cdc59031f99cb87f5a0b0c110f629677410d5a38
Firewalled Environment Storage Overview
0
39
356
348
2023-06-14T23:23:23Z
Weiler
3
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of shared ''Italic text''storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
4f06ada435c3b0e7217c2cd4688d2ab6c07ac1c7
357
356
2023-06-14T23:23:45Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
ef8559b4d254a3babd18814e13bc3c3db01ac6b4
358
357
2023-06-14T23:24:20Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Slow - Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
10342fc927d0b3f3a0458c379b606074e4190a3f
359
358
2023-06-14T23:25:32Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/group/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
81954e6468c8c452de3d5f4e16e798b6dda6608d
360
359
2023-06-14T23:26:18Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
e18a634d9c93f067400fdc4b99b278915ef721ff
392
360
2023-07-16T14:56:12Z
Weiler
3
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
bcbd1619120facf47dfd5123c6a8998741d21afb
393
392
2023-07-16T18:19:36Z
Weiler
3
/* /scratch Space on the Servers */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
'''Soft Versus Hard Quotas'''
We use soft and hard quotas for disk space.
Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory. You can reset the countdown timer by dropping down to under the soft quota limit.
You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows for. This outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
d063a4213b104cebadc0e43cdc7478557f5a6978
Access to the Firewalled Compute Servers
0
17
361
339
2023-06-14T23:27:30Z
Weiler
3
wikitext
text/x-wiki
Before you can access the firewalled environment (Prism), you must get VPN access to it, which is detailed here:
[[Requirement for users to get GI VPN access]]
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation. If your account was created in January, February or March, then your account expiration date will be July 1st of the '''current year'''. If the account was created after March, then your expiration date will be July 1st of the '''following year'''.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
a7c1e5214097845153106518450e3e375d824ec9
Firewalled Storage Cost
0
42
363
2023-06-14T23:33:04Z
Weiler
3
Created page with "== Account and Storage Cost == Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "G..."
wikitext
text/x-wiki
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
As of the writing of this document, it looks like this:
{| class="wikitable"
|- style="font-weight:bold; text-align:center;"
! Service
! Cost
|-
| UNIX User Account per Month
| style="text-align:center;" | $28.77
|-
| OpenStack User Account per Month
| style="text-align:center;" | $28.77
|-
| TB of Storage per Month
| style="text-align:center;" | $14.97
|}
a4eeedba218192a00dfcbe1f327c00eb421688ae
364
363
2023-06-14T23:34:24Z
Weiler
3
/* Account and Storage Cost */
wikitext
text/x-wiki
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
As of the writing of this document, it looks like this:
{| class="wikitable"
|- style="font-weight:bold; text-align:center;"
! Service
! Cost
|-
| UNIX User Account per Month
| style="text-align:center;" | $28.77
|-
| OpenStack User Account per Month
| style="text-align:center;" | $28.77
|-
| TB of Storage per Month
| style="text-align:center;" | $14.97
|}
The sponsor of each user and owner of each /private/groups/labname area provides a FOAPAL to our finance group to cover the monthly cost of these resources.
7dc99182212c31eb7d20e9ec7d89c7ff44758d6e
365
364
2023-06-14T23:35:00Z
Weiler
3
/* Account and Storage Cost */
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Firewalled User Account and Storage Cost
0
43
367
2023-06-14T23:35:27Z
Weiler
3
Created page with "== Account and Storage Cost == Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "G..."
wikitext
text/x-wiki
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
As of the writing of this document, it looks like this:
{| class="wikitable"
|- style="font-weight:bold; text-align:center;"
! Service
! Cost
|-
| UNIX User Account per Month
| style="text-align:center;" | $28.77
|-
| OpenStack User Account per Month
| style="text-align:center;" | $28.77
|-
| TB of Storage per Month
| style="text-align:center;" | $14.97
|}
The sponsor of each user and owner of each /private/groups/labname area provides a FOAPAL to our finance group to cover the monthly cost of these resources.
7dc99182212c31eb7d20e9ec7d89c7ff44758d6e
Overview of using Slurm
0
32
370
338
2023-06-15T16:22:50Z
Weiler
3
/* Submit a Slurm Batch Job */
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix.prism). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=main
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each CPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
7891e5fd384ed929f11c5c8a705812179a691b47
Slurm Tips for Toil
0
38
371
329
2023-06-27T14:08:56Z
Anovak
4
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/running/wdl.rst#running-wdl-with-toil the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pip3 install --upgrade toil[wdl]
To use a development version of Toil, you can install from source instead:
pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
e5a78c161565178ada8ca9efdf97ccde6214ebc9
Using Docker under Slurm
0
44
373
2023-06-27T21:21:56Z
Weiler
3
Created page with "Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforeha..."
wikitext
text/x-wiki
Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.
You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.
You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:
1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker
Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
weiler/mytools latest be6777ad00cf 19 hours ago 396MB
somedude/tools latest 9b1d1f6fbf6f 3 weeks ago 607MB
$ docker image rm be6777ad00cf
We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This inlcudes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard,emerald, crimson or razzmatazz.
af17656b5237b90a6b8f65291e6224882dd5c756
374
373
2023-06-27T21:22:32Z
Weiler
3
wikitext
text/x-wiki
Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.
You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.
You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:
1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker
Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
weiler/mytools latest be6777ad00cf 19 hours ago 396MB
somedude/tools latest 9b1d1f6fbf6f 3 weeks ago 607MB
$ docker image rm be6777ad00cf
We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This inlcudes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard, emerald, crimson or razzmatazz.
cfe9641f8da6f9e2448e4aaa7a692dc8f6e2f98f
381
374
2023-06-30T16:42:37Z
Weiler
3
wikitext
text/x-wiki
Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.
== Testing ==
You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.
You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:
1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker
Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
weiler/mytools latest be6777ad00cf 19 hours ago 396MB
somedude/tools latest 9b1d1f6fbf6f 3 weeks ago 607MB
$ docker image rm be6777ad00cf
== Resource Limits ==
When running docker containers on Slurm, slurm cannot limit the resources that docker uses. Therefore, when you launch a container, you will need to know how much resources (RAM, CPU) it uses beforehand, determined by your testing. Then launch your job with the following --cpus and --memory parameters so docker itslef will limit what it uses:
docker run --rm '''--cpus=16 --memory=1024m''' docker/welcome-to-docker
The --memory argument is in megabytes (hence the 'm' at the end). So the above example will set a memory limit of 1GB.
== Cleaning Scripts ==
We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This includes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard, emerald, crimson or razzmatazz.
Also, there are cleaning scripts in place that will destroy any running containers that have been running for over 7 days. We assume that such a container was not launched with '''--rm''' and needs to be cleaned up.
03718700afe704719aafdd58b00f954d386a66f7
382
381
2023-06-30T16:42:56Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.
== Testing ==
You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.
You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:
1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker
Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
weiler/mytools latest be6777ad00cf 19 hours ago 396MB
somedude/tools latest 9b1d1f6fbf6f 3 weeks ago 607MB
$ docker image rm be6777ad00cf
== Resource Limits ==
When running docker containers on Slurm, slurm cannot limit the resources that docker uses. Therefore, when you launch a container, you will need to know how much resources (RAM, CPU) it uses beforehand, determined by your testing. Then launch your job with the following --cpus and --memory parameters so docker itslef will limit what it uses:
docker run --rm '''--cpus=16 --memory=1024m''' docker/welcome-to-docker
The --memory argument is in megabytes (hence the 'm' at the end). So the above example will set a memory limit of 1GB.
== Cleaning Scripts ==
We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This includes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard, emerald, crimson or razzmatazz.
Also, there are cleaning scripts in place that will destroy any running containers that have been running for over 7 days. We assume that such a container was not launched with '''--rm''' and needs to be cleaned up.
1610bc006943e8967550e2c6c674431f263cb745
Phoenix WDL Tutorial
0
45
376
2023-06-28T21:46:41Z
Anovak
4
Created page with "=Tutorial: Getting Started with WDL Workflows on Phoenix= Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experime..."
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], and how to write your own workflows in WDL.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containign a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
5df3e6a659f8dcaf260eff01c736f57dea64c2c2
377
376
2023-06-28T21:47:28Z
Anovak
4
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], and how to write your own workflows in WDL.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containign a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
137c1ac39d34ed13abfae71a74a92e2fa2696502
383
377
2023-07-07T14:33:13Z
Anovak
4
/* Testing at small scale single-machine */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], and how to write your own workflows in WDL.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
4159a195de9e24566f8d0c689894ce460b85ceb0
384
383
2023-07-07T14:35:06Z
Anovak
4
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], and how to write your own workflows in WDL.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
a0c34f24329c930ee6b74c4e18f3b1a8c9699d10
385
384
2023-07-07T14:36:19Z
Anovak
4
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], and how to write your own workflows in WDL.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
a87638e980628d176893a19cb5b3a5e7b09dff4f
386
385
2023-07-11T13:49:04Z
Anovak
4
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
===Frequently Asked Questions===
====I am getting warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>====
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
08ebf3b6901ab129f6f8dced040b9c295ec2a00d
387
386
2023-07-11T13:51:51Z
Anovak
4
/* Debugging Workflows */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as </code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
===Frequently Asked Questions===
====I am getting warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>====
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code> but I can't find that file!====
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
09eb179e6e6ab4cde56b7e1717c2130060f7eca3
388
387
2023-07-11T16:54:10Z
Anovak
4
/* Installing Toil with WDL support */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
===Frequently Asked Questions===
====I am getting warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>====
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code> but I can't find that file!====
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
e43765b47dcf619b67858999eeffc3715d252ce1
394
388
2023-07-17T15:06:56Z
Anovak
4
/* Frequently Asked Questions */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
6e119ed05fdd692759c00cacae7ea17c6bd6ea85
395
394
2023-07-17T15:25:58Z
Anovak
4
/* Debugging Workflows */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Smaple.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
e94ce99693859a64b8145edb1e1e171bc14c007f
396
395
2023-07-17T15:27:01Z
Anovak
4
/* More Ways of Finding Files */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user toil[wdl]
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
0b5f5835441107b263170fcf50281a5bdd8011ab
397
396
2023-07-20T13:48:08Z
Anovak
4
/* Installing Toil with WDL support */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b76f0a797a48b9fe149e899fb6e2ffb741feabae
398
397
2023-07-20T13:53:45Z
Anovak
4
/* Debugging Workflows */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, **log out and log back in again**, to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
c0939f6ac8d6e2014332a60ef6c2a0e11f44116b
How to access the public servers
0
11
378
268
2023-06-29T01:53:06Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.1
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /scratch Space on the Servers ==
Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
29bbd18d4daa13460f3fe780d7c1beff27e34e18
399
378
2023-07-25T19:23:25Z
Weiler
3
/* /scratch Space on the Servers */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, CentOS 7.9
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.1
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
2c10281d4fbfeb2a15b6cfaed0180459deb84032
401
399
2023-07-28T18:17:41Z
Weiler
3
/* Server Types and Management */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, Ubuntu 22.04.2
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.2
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a 15TB quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
717a7ba41908aa9f3cb3ea480de4c787cb267841
MediaWiki:Sidebar
8
5
391
74
2023-07-16T03:46:45Z
Weiler
3
wikitext
text/x-wiki
* navigation
** Genomics Institute Computing Information|Genomics Institute Computing Information
** recentchanges-url|recentchanges
** helppage|help
1c42f6c40ffb28cc861ce5f20de7ce9c751807ce
MediaWiki:Common.css
8
46
402
2023-07-29T18:20:47Z
Weiler
3
Created page with "/* CSS placed here will be applied to all skins */ .mw-body-content { line-height: 5; }"
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 5;
}
da2f31e0a52c2961d80ccdaf69775cefeec238d1
403
402
2023-07-29T18:21:17Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1;
}
0c6528b1ccd90fca1b05d949e324ec55380801bd
MediaWiki:Common.css
8
46
404
403
2023-07-29T18:24:42Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1;
font-size: 10px;
}
cc9bcfb028b740f3644d6fc11e904801492b9a09
405
404
2023-07-29T18:25:05Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1;
font-size: 12px;
}
08072ecb1a51d3c81213bb1cde816f3c95caf9b7
406
405
2023-07-29T18:25:35Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1;
font-size: 13px;
}
8200c5d04c68a3de809bde44225180cf92d0df50
407
406
2023-07-29T18:26:45Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1.5;
font-size: 13px;
}
8e7e5dac8863e052bf153518db5513d08aebe951
408
407
2023-07-29T18:27:05Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1.2;
font-size: 13px;
}
3d33e08c139ad2d4f78e22f02ff97cad93a6602f
409
408
2023-07-29T18:30:16Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1.3;
font-size: 13px;
}
65fb52668cb426f6c386e8038ebc68e8d48ec5a9
410
409
2023-07-29T18:46:21Z
Weiler
3
css
text/css
/* CSS placed here will be applied to all skins */
.mw-body-content {
line-height: 1.3;
font-size: 14px;
}
90e458c9d5c6d0f41cab196f00aaac700e201e2c
Firewalled Computing Resources Overview
0
41
411
400
2023-08-07T14:44:25Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
5586859aa9934e49abc1636c3023441f8ccdca75
450
411
2023-12-06T18:30:47Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-09
| style="text-align:left;" | Ubuntu 22.04
| 40
| 8 / Nvidia 1080ti
| 256 GB
| 10 Gb/s
| 1.5 TB NVMe
|-
| style="text-align:left;" | phoenix-10
| style="text-align:left;" | Ubuntu 22.04
| 36
| 4 / Nvidia 2080ti
| 385 GB
| 10 Gb/s
| 3 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
f6dd4fd4f087f159cb3a36b359fbb41d25cc4b5d
451
450
2023-12-06T18:31:48Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node, from which all jobs are submitted via the SLURM job scheduling framework, is '''phoenix.prism'''. To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
53abd383507f36562f8a9c873361105667cc3025
Phoenix WDL Tutorial
0
45
412
398
2023-08-10T15:06:42Z
Anovak
4
/* Configuring Toil for Phoenix */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, ''log out and log back in again'', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b7b2ccf8c4b9e0a9f4799c4d132bf17ee7bf04e1
413
412
2023-08-10T15:07:08Z
Anovak
4
/* Configuring Toil for Phoenix */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files. Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
d0dfcc4c14c0ada68994703d7dc0a7db1d78400b
414
413
2023-08-10T15:09:01Z
Anovak
4
/* Writing your own workflow */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is *not* optional, and there is no default value, then the user's inputs file *must* specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional *expressions* with a <code>then</code> and an <code>else</code>, but conditional *statements* only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren's supposed to need this, but you do need it in 1.1 and Toil doesn't actually support not having one yet, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
482197265102648915b14fb00ac010b3e9049e43
415
414
2023-08-10T15:11:04Z
Anovak
4
/* Writing the file */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only *once* in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
8ea9f68e9f63adeeab355f55d70b7538352f7b5f
416
415
2023-08-10T15:11:40Z
Anovak
4
/* Writing the file */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with </code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
5eb91bfb9b19a9b8a4f8836865f7e440feeebb4f
417
416
2023-08-10T15:13:11Z
Anovak
4
/* Reproducing Problems */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
===Frequently Asked Questions===
====I am getting warnings about <code>XDG_RUNTIME_DIR</code>====
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
====Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!====
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
6129219b37718c27ee5b0696f18e0742a54753e3
418
417
2023-08-10T15:14:24Z
Anovak
4
/* Frequently Asked Questions */
wikitext
text/x-wiki
=Tutorial: Getting Started with WDL Workflows on Phoenix=
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
==Setup==
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
===Getting VPN access===
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
===Connecting to Phoenix===
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
===Installing Toil with WDL support===
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
===Configuring Toil for Phoenix===
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Running an existing workflow==
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
===Preparing an input file===
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
===Testing at small scale single-machine===
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
===Running at larger scale===
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
==Writing your own workflow==
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
===Writing the file===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
==Debugging Workflows==
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
===Debugging Options===
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
===Reading the Log===
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
===Reproducing Problems===
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
===More Ways of Finding Files===
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
===Using Development Versions of Toil===
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
==Additional WDL resources==
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
007834ed9e2d825a835e1c1fccf3364e4f8cc5cd
419
418
2023-08-10T15:17:19Z
Anovak
4
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b32acae3cbf5ff88ec9f812a747e024bd9ec420d
420
419
2023-08-10T15:21:13Z
Anovak
4
/* Writing the file */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
3ae46bb50fe340ede0fdc2c23f74e123c73bd26c
421
420
2023-08-10T15:21:50Z
Anovak
4
/* Writing Tasks */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
Toil will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
4115b0b3b39ee6b7531b49b32365d3ad0bc91574
422
421
2023-08-10T15:24:10Z
Anovak
4
/* Toil said it was Redirecting logging somewhere, but I can't find that file! */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows on the Genomics Institute's Phoenix cluster. By the end, you will be able to run workflows on Phoenix with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
f5d095505d0d678d3f7e43470edf6b592d0b531e
423
422
2023-08-10T15:27:19Z
Anovak
4
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
8b87f461fa5c65293235e03906afb286e4069755
425
423
2023-10-20T21:21:46Z
Anovak
4
Remind people where the data needs to live.
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. So, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
dc88d3b29b6aed2aabfa72e72b7d72db53ae2455
426
425
2023-10-23T20:24:46Z
Anovak
4
Cache Singularity stuff outside the size-limited home directories.
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day], so it is important not to skip this step.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
83bf51722b7bae0bd582cf28437c099bbae41fb7
433
426
2023-10-27T22:32:00Z
Anovak
4
/* Configuring Toil for Phoenix */ Complain about file locking
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
bfa519b6e260331e61fb2363f35cedaf3d929b98
434
433
2023-11-09T22:12:25Z
Anovak
4
/* Configuring Toil for Phoenix */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then, after logging out and in again, use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
4f5c6510218e6fe21190d5a87582f686160ab26f
447
434
2023-12-01T16:02:52Z
Anovak
4
/* Configuring Toil for Phoenix */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --pty bash -i
This will start a new shell; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
mkdir -p logs
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
bfa519b6e260331e61fb2363f35cedaf3d929b98
448
447
2023-12-01T16:10:49Z
Anovak
4
Show using the partitions
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> to tell Toiul how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium" toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium" toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
144db86efb796b7f1ceae5ab3a37b63d970257ea
449
448
2023-12-01T16:14:21Z
Anovak
4
/* Testing at small scale single-machine */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> to tell Toiul how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium" toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium" toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
2eead521fb99851e8c41fde6af151b986b455cc2
452
449
2023-12-07T23:00:48Z
Anovak
4
Show the safer export syntax
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/public/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/public/groups</code>. Usually you would end up with <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/public/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/public/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/public/groups</code>, and make a directory to work in.
cd /public/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
66b57c9e84db6bc307f897df19f2cce6bf56b7d3
AWS Account List and Numbers
0
22
424
328
2023-10-12T19:52:02Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
platform-hca-portal : 158963592881
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
agc-runs : 598929688444
sequencing-center-cold-store : 436140841220
f1f62d636ee4c9c01f60481107bc0c0d0eb91734
Genomics Institute Computing Information
0
6
427
375
2023-10-23T23:05:51Z
Weiler
3
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
83b135b2514a833b3345b4c7e69d3fe45c14754b
435
427
2023-11-14T21:54:50Z
Weiler
3
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==Kubernetes Information==
*[[Computational Genomics Kubernetes Installation]]
*[[Undiagnosed Disease Project Kubernetes Installation]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
604f5ab5bf5b0ee3a4c0496982c310362f62c636
442
435
2023-11-27T03:30:04Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
15aa55f5826610448296f291bbffc779e00ee310
Cluster Etiquette
0
47
428
2023-10-23T23:42:12Z
Weiler
3
Created page with "Begin!"
wikitext
text/x-wiki
Begin!
2230293d480ce70f013543beb0763396009710ab
430
428
2023-10-25T21:32:39Z
Weiler
3
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jibs that inadvertantly go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them, you could bring down the file server serving /private/groups. Run only maybe 5 at once in that case. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
41663ffedfc1d5fe14bc5f0e7534674d9a424a9b
431
430
2023-10-25T21:54:52Z
Weiler
3
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jibs that inadvertantly go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them at the same time, you could bring down the file server serving /private/groups. Run only maybe 5 at once in that case, or introduce a random delay at the start of your jobs. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
829b4395cf67c51771c1380f03023c9e5990de4f
How to access the public servers
0
11
429
401
2023-10-24T17:07:59Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, Ubuntu 22.04.2
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.2
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a default 15TB quota (although in some cases the quota is higher). For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
ad050e29409f87b61cd5be5efc9ca7a923e8f67b
GPU Resources
0
36
432
327
2023-10-26T13:51:29Z
Weiler
3
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
For the most part, Slurm takes care of making sure that each job only sees and used the GPUs assigned to it. Within the job, '''CUDA_VISIBLE_DEVICES''' will be set in the environment, but it will always be set to a list of your requested number of GPUs, starting at 0. Slurm re-numbers the GPUs assigned to each job to appear to start at 0, within the job. If you need access to the "real" GPU numbers (to log or to pass along to Docker), they are available in the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLURM_STEP_GPUS''' (for '''srun''') environment variable.
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to the GPUs that Slurm has selected using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
Slurm's containment of the Slurm job to the correct set of GPUs is also passed through to the Singularity container; there is no need to specifically direct Singularity to use the right GPUs unless you are doing something unusual.
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLRUM_STEP_GPUS''' (for '''srun''') environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
29a5e42ba6c3394828af4d61106b83e7be9c4af0
Slurm Queues (Partitions) and Resource Management
0
48
436
2023-11-14T22:41:11Z
Weiler
3
Created page with "Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues. Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run. {| class="wikitable" |- style="font-weight:bold;" ! Partition Name ! Default Walltime Limit ! Maximum Walltime Li..."
wikitext
text/x-wiki
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
bd990a9f72bda2a5a9c88b292445fd777f626e1e
437
436
2023-11-14T22:50:52Z
Weiler
3
wikitext
text/x-wiki
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by deafult. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one jobs and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' and then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: discovery
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
eb857f948d96621ef297e13afe218851a64c3b41
438
437
2023-11-14T22:55:25Z
Weiler
3
wikitext
text/x-wiki
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by deafult. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one jobs and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
5ae27fc9563e108b3450c1d7a7fbb80aaf1e131f
439
438
2023-11-14T22:58:57Z
Weiler
3
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by deafult. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one jobs and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
40cdec882bb5d7f6882fb5c89ce68dd488f1de5f
440
439
2023-11-14T23:25:05Z
Weiler
3
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by deafult. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
466e9dce7934b6df5586877e4f7af8f6676ea103
446
440
2023-11-29T18:07:38Z
Weiler
3
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by default. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
250d6cbc10d82c5b9e45d1921dce1c3a8126f0eb
Slurm Tips for Toil
0
38
441
371
2023-11-20T21:54:12Z
Anovak
4
Show how to install with extras and a branch
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/running/wdl.rst#running-wdl-with-toil the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pip3 install --upgrade toil[wdl]
To use a development version of Toil, you can install from source instead:
pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]
Or for a particular branch:
pip3 install git+https://github.com/DataBiosphere/toil.git@issues/123-abc#egg=toil[wdl]
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
a349216df5a7dcd00c1b6c5a35c0bc2e0a5f6619
Running a Container as a non-root User
0
49
443
2023-11-27T03:38:05Z
Weiler
3
Created page with "Information here pulled from an article by Lucas Wilson-Richter on medium.com: https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15 =='''The Problem: Docker writes files as root'''== Sometimes, when we run builds in Docker containers, the build creates files in a folder that’s mounted into the container from the host (e.g. the source code directory). This can cause us pain, because those files will be owned by the root user. When..."
wikitext
text/x-wiki
Information here pulled from an article by Lucas Wilson-Richter on medium.com:
https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15
=='''The Problem: Docker writes files as root'''==
Sometimes, when we run builds in Docker containers, the build creates files in a folder that’s mounted into the container from the host (e.g. the source code directory). This can cause us pain, because those files will be owned by the root user. When an ordinary user tries to clean those files up when preparing for the next build (for example by using git clean), they get an error and our build fails.
There are a few ways we could deal with this problem:
* We could try to prevent the build from creating any files, but that’s very limiting — we lose the ability to generate assets, or write any data to the disk. This is definitely too restrictive to solve the problem in a way that I could use with any build.
* We could tell Git to ignore the affected files, but that carries the risk that they’ll hang around in the file system and have an effect on future builds. We’ve encountered that problem in the past at Redbubble, so we are wary about letting that happen again.
* We could clean up the files at the end of the build, while we’re still running our Dockerised process. But that would require us to implement lots of error trapping logic to ensure the cleanup happens, but still exit the build with the correct result.
It would be more elegant if we could simply create files in a way that allows ordinary users to delete them. For example, we could tell Docker to run as an ordinary user instead of root.
=='''Time to be someone else'''==
Fortunately, docker run gives us a way to do this: the --user parameter. We're going to use it to specify the user ID (UID) and group ID (GID) that Docker should use. This works because Docker containers all share the same kernel, and therefore the same list of UIDs and GIDs, even if the associated usernames are not known to the containers (more on that later).
To run our asset build, we could use a command something like this:
docker container run --rm -it \
-v $(app):/app \ # Mount the source code
--workdir /app \ # Set the working dir
--user 1000:1000 \ # Run as the given user
my-docker/my-build-environment:latest \ # Our build env image
make assets # ... and the command!
This will tell Docker to run its processes with user ID 1000 and group ID 1000. That will mean that any files created by that process also belong to the user with ID 1000.
=='''But I just want to be me!'''==
But what if we don’t know the current user’s ID? Is there some way to automatically discover that?
There is: id is a program for finding out exactly this information. We can use it with the -u switch to get the UID, and the -g switch to get the GID. So instead of setting --user 1000:1000, we could use subshells to set
--user $(id -u):$(id -g). That way, we can always use the current user's UID and GID.
=='''docker-compose'''==
We often like to run our tests and things using docker-compose, so that we can spin up any required services as needed - databases and so on. So wouldn't it be nice if we could do this with docker-compose as well?
Unfortunately, we can’t use subshells in a compose file — it’s not a supported part of the format. Lucky for us, we can insert environment variables. So if we have a docker-compose.yml like this:
# This is an abbreviated example docker-compose.yml
version: '3.3'
services:
rspec:
image: my-docker/my-build-environment:latest
environment:
- RAILS_ENV=test
command: ["make", "assets"] # THIS BIT!!!1!
user: ${CURRENT_UID} volumes:
- .:/app
We could use a little bash to set that variable and start docker-compose:
CURRENT_UID=$(id -u):$(id -g) docker-compose up
Et voila! Our Dockerised script will create files as if it were the host user!
=='''Gotchas'''==
'''Your user will be $HOME-less.'''
What we’re actually doing here is asking our Docker container to do things using the ID of a user it knows nothing about, and that creates some complications. Namely, it means that the user is missing some of the things we’ve learned to simply expect users to have — things like a home directory. This can be troublesome, because it means that all the things that live in $HOME — temporary files, application settings, package caches — now have nowhere to live. The containerised process just has no way to know where to put them.
This can impact us when we’re trying to do user-specific things. We found that it caused problems using gem install (though using Bundler is OK), or running code that relies on ENV['HOME']. So it may mean that you need to make some adjustments if you do either of those things.
'''Your user will be nameless, too'''
It also turns out that we can’t easily share usernames between a Docker host and its containers. That’s why we can’t just use docker run --user=$(whoami) — the container doesn't know about your username. It can only find out about your user by its UID.
That means that when you run whoami inside your container, you'll get a result like I have no name!. That's entertaining, but if your code relies on knowing your username, you might get some confusing results.
'''Wrapping Up'''
We now have a way to use docker run and docker-compose to create files, without having to use sudo to clean them up!
Happy building!
e3a4c6e373cdab0e2d7380222d55a8d27ffaa30c
444
443
2023-11-27T03:46:36Z
Weiler
3
wikitext
text/x-wiki
Information here pulled from an article by Lucas Wilson-Richter on medium.com:
https://medium.com/redbubble/running-a-docker-container-as-a-non-root-user-7d2e00f8ee15
=='''The Problem: Docker writes files as root'''==
Sometimes, when we run builds in Docker containers, the build creates files in a folder that’s mounted into the container from the host (e.g. the source code directory). This can cause us pain, because those files will be owned by the root user. When an ordinary user tries to clean those files up when preparing for the next build (for example by using git clean), they get an error and our build fails.
There are a few ways we could deal with this problem:
* We could try to prevent the build from creating any files, but that’s very limiting — we lose the ability to generate assets, or write any data to the disk. This is definitely too restrictive to solve the problem in a way that I could use with any build.
* We could tell Git to ignore the affected files, but that carries the risk that they’ll hang around in the file system and have an effect on future builds. We’ve encountered that problem in the past at Redbubble, so we are wary about letting that happen again.
* We could clean up the files at the end of the build, while we’re still running our Dockerised process. But that would require us to implement lots of error trapping logic to ensure the cleanup happens, but still exit the build with the correct result.
It would be more elegant if we could simply create files in a way that allows ordinary users to delete them. For example, we could tell Docker to run as an ordinary user instead of root.
=='''Time to be someone else'''==
Fortunately, docker run gives us a way to do this: the --user parameter. We're going to use it to specify the user ID (UID) and group ID (GID) that Docker should use. This works because Docker containers all share the same kernel, and therefore the same list of UIDs and GIDs, even if the associated usernames are not known to the containers (more on that later).
To run our asset build, we could use a command something like this:
docker container run --rm -it \
-v $(app):/app \ # Mount the source code
--workdir /app \ # Set the working dir
--user 1000:1000 \ # Run as the given user
my-docker/my-build-environment:latest \ # Our build env image
make assets # ... and the command!
This will tell Docker to run its processes with user ID 1000 and group ID 1000. That will mean that any files created by that process also belong to the user with ID 1000.
=='''But I just want to be me!'''==
But what if we don’t know the current user’s ID? Is there some way to automatically discover that?
There is: id is a program for finding out exactly this information. We can use it with the -u switch to get the UID, and the -g switch to get the GID. So instead of setting --user 1000:1000, we could use subshells to set
--user $(id -u):$(id -g). That way, we can always use the current user's UID and GID.
=='''docker-compose'''==
We often like to run our tests and things using docker-compose, so that we can spin up any required services as needed - databases and so on. So wouldn't it be nice if we could do this with docker-compose as well?
Unfortunately, we can’t use subshells in a compose file — it’s not a supported part of the format. Lucky for us, we can insert environment variables. So if we have a docker-compose.yml like this:
# This is an abbreviated example docker-compose.yml
version: '3.3'
services:
rspec:
image: my-docker/my-build-environment:latest
environment:
- RAILS_ENV=test
command: ["make", "assets"] # THIS BIT!!!1!
user: ${CURRENT_UID} volumes:
- .:/app
We could use a little bash to set that variable and start docker-compose:
CURRENT_UID=$(id -u):$(id -g) docker-compose up
Et voila! Our Dockerised script will create files as if it were the host user!
=='''Gotchas'''==
'''Your user will be $HOME-less.'''
What we’re actually doing here is asking our Docker container to do things using the ID of a user it knows nothing about, and that creates some complications. Namely, it means that the user is missing some of the things we’ve learned to simply expect users to have — things like a home directory. This can be troublesome, because it means that all the things that live in $HOME — temporary files, application settings, package caches — now have nowhere to live. The containerised process just has no way to know where to put them.
This can impact us when we’re trying to do user-specific things. We found that it caused problems using gem install (though using Bundler is OK), or running code that relies on ENV['HOME']. So it may mean that you need to make some adjustments if you do either of those things.
'''Your user will be nameless, too'''
It also turns out that we can’t easily share usernames between a Docker host and its containers. That’s why we can’t just use docker run --user=$(whoami) — the container doesn't know about your username. It can only find out about your user by its UID.
That means that when you run whoami inside your container, you'll get a result like I have no name!. That's entertaining, but if your code relies on knowing your username, you might get some confusing results.
'''Wrapping Up'''
We now have a way to use docker run and docker-compose to create files, without having to use sudo to clean them up!
Happy building!
3ad10ac81eedae5e14d4f7b592bc53133527bdfe
Overview of using Slurm
0
32
445
370
2023-11-29T17:39:48Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into the Slurm head node (currently phoenix.prism). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=short
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each CPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
82f2a085eeb65518f90700cc564f1df13365cc6a
Slurm Tips for vg
0
37
453
320
2024-01-05T15:11:59Z
Anovak
4
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to the cluster head node:
ssh phoenix.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t dsa && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G --time=00:30:00 make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* If you want an interactive session with appreciable resources, you can schedule one with '''srun'''. For example, to get 16 cores and 120G memory all for you, run:
srun -c 16 --mem 120G --time=08:00:00 --partition=medium --pty bash -i
* To send out a job without making a script file for it, use '''sbatch --wrap "your command here"'''.
* You can use arguments from SBATCH lines on the command line!
* You can use [https://github.com/CLIP-HPC/SlurmCommander#readme Slurm Commander] to watch the state of the cluster with the '''scom''' command.
a093b972ba428e3dc9f1d57de47a300eb3866c0d
Slurm Queues (Partitions) and Resource Management
0
48
454
446
2024-01-05T15:12:56Z
Anovak
4
/* Partitions */
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in (with e.g. <code>--partition=medium</code>), it will automatically be assigned the "short" partition by default. If you do not specify a walltime value in your job submission script (with e.g. <code>--time=00:30:00</code>), it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
aac3685ea2a872d14aa402b9ea5e62983a31b570
456
454
2024-01-22T16:44:35Z
Anovak
4
Document how the priority system works and how to make the scheduler account for its choices
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in (with e.g. <code>--partition=medium</code>), it will automatically be assigned the "short" partition by default. If you do not specify a walltime value in your job submission script (with e.g. <code>--time=00:30:00</code>), it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
== My job is not running but I want it to be running! ==
Even if your job is in the high-priority partition, that doesn't mean that the cluster will drop everything and run it immediately. Because we don't have pre-emption set up, high priority jobs still have to wait for currently-running jobs to finish, as well as for other high-priority jobs. And since, as noted above, jobs can be allowed to run for up to 7 days each, it is physically possible for even the highest-priority job in the whole cluster to not start for a whole week.
Here is a [https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/why-job-not-run/ good resource from Berkeley] about understanding and debugging Slurm job scheduling. Basically, Slurm uses the wall-clock limits of running jobs, and of jobs in the queue, to make a plan to start each job on some node at some time in the future. If jobs finish early, other jobs can start sooner than scheduled, and if there is space around higher-priority jobs, lower-priority jobs can be filled in.
If you want to know when Slurm plans to run your job, and why that is not right now, you can use the <code>--start</code> option for the <code>squeue</code> command:
$ squeue -j 1719584 --start
JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON)
1719584 short snakemak flastnam PD 2024-01-22T10:20:00 1 phoenix-00 (Priority)
The <code>START_TIME</code> column is the time by which Slurm is sure it will be able to start your job if no higher-priority jobs come in first, and the <code>NODELIST(REASON)</code> column shows the nodes the job is running on, or the reason it is not running now, in parentheses. In this case, the job is not running because higher-priority jobs are in the way.
7595fd63c11808a8be0735792874c1c84832efd4
474
456
2024-03-25T20:29:53Z
Anovak
4
/* My job is not running but I want it to be running! */
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 7 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in (with e.g. <code>--partition=medium</code>), it will automatically be assigned the "short" partition by default. If you do not specify a walltime value in your job submission script (with e.g. <code>--time=00:30:00</code>), it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
== My job is not running but I want it to be running ==
Even if your job is in the high-priority partition, that doesn't mean that the cluster will drop everything and run it immediately. Because we don't have pre-emption set up, high priority jobs still have to wait for currently-running jobs to finish, as well as for other high-priority jobs. And since, as noted above, jobs can be allowed to run for up to 7 days each, it is physically possible for even the highest-priority job in the whole cluster to not start for a whole week.
Here is a [https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/why-job-not-run/ good resource from Berkeley] about understanding and debugging Slurm job scheduling. Basically, Slurm uses the wall-clock limits of running jobs, and of jobs in the queue, to make a plan to start each job on some node at some time in the future. If jobs finish early, other jobs can start sooner than scheduled, and if there is space around higher-priority jobs, lower-priority jobs can be filled in.
If you want to know when Slurm plans to run your job, and why that is not right now, you can use the <code>--start</code> option for the <code>squeue</code> command:
$ squeue -j 1719584 --start
JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON)
1719584 short snakemak flastnam PD 2024-01-22T10:20:00 1 phoenix-00 (Priority)
The <code>START_TIME</code> column is the time by which Slurm is sure it will be able to start your job if no higher-priority jobs come in first, and the <code>NODELIST(REASON)</code> column shows the nodes the job is running on, or the reason it is not running now, in parentheses. In this case, the job is not running because higher-priority jobs are in the way.
2d70965d268255f270701b8eb46a2ad0d550d477
Slurm Tips for vg
0
37
455
453
2024-01-05T15:35:25Z
Anovak
4
/* Setting Up */
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to the cluster head node:
ssh phoenix.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t ed25519 && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G --time=00:30:00 make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* If you want an interactive session with appreciable resources, you can schedule one with '''srun'''. For example, to get 16 cores and 120G memory all for you, run:
srun -c 16 --mem 120G --time=08:00:00 --partition=medium --pty bash -i
* To send out a job without making a script file for it, use '''sbatch --wrap "your command here"'''.
* You can use arguments from SBATCH lines on the command line!
* You can use [https://github.com/CLIP-HPC/SlurmCommander#readme Slurm Commander] to watch the state of the cluster with the '''scom''' command.
cd1f2443e3ecdcb2a91b221ed237507a23852045
Genomics Institute Computing Information
0
6
457
442
2024-01-25T02:57:01Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
4133e3ae4508aaffec5ad9057c9b7647a8083465
478
457
2024-04-26T18:04:47Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
dae2f38558ffc2605964b35a24843e2d6d376e19
Grafana Performance Metrics
0
50
458
2024-01-25T03:00:42Z
Weiler
3
Created page with "We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website: http://grafana.prism Then login using the following credentials: username: guest password: MoreStats4me"
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism
Then login using the following credentials:
username: guest
password: MoreStats4me
106bd17ef6bfd9978158b1a56c272dd4854350c6
460
458
2024-01-25T03:04:16Z
Weiler
3
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism
Then login using the following credentials:
username: guest
password: MoreStats4me
Once logged in, click the small button on the top left of the window with the three small horizontal bars in it, and navigate to the "Dashboards" menu item.
[[File:grafana_menu.png|900px]]
Then,
37df8302871d1a190029c4b0293d31bbec2437ee
462
460
2024-01-25T03:07:24Z
Weiler
3
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism
Then login using the following credentials:
username: guest
password: MoreStats4me
Once logged in, click the small button on the top left of the window with the three small horizontal bars in it, and navigate to the "Dashboards" menu item.
[[File:grafana_menu.png|900px]]
From there, you should be able to see the various sub-folders of the different dashboards for different classes of machines.
[[File:grafana_dashboards.png|900px]]
94ea01f302cde36eec28a2520e078b135f0b581b
463
462
2024-01-25T03:07:49Z
Weiler
3
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism
Then login using the following credentials:
username: guest
password: MoreStats4me
Once logged in, click the small button on the top left of the window with the three small horizontal bars in it, and navigate to the "Dashboards" menu item.
[[File:grafana_menu.png|900px]]
From there, you should be able to see the various sub-folders of the different dashboards for different classes of machines.
[[File:grafana_dashboards.png|1200px]]
ec75acd0f9eac601aa52e889b2ad47ad535eb163
464
463
2024-01-25T03:35:38Z
Weiler
3
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. This is only available in the firewalled/PRISM area. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism
Then login using the following credentials:
username: guest
password: MoreStats4me
Once logged in, click the small button on the top left of the window with the three small horizontal bars in it, and navigate to the "Dashboards" menu item.
[[File:grafana_menu.png|900px]]
From there, you should be able to see the various sub-folders of the different dashboards for different classes of machines.
[[File:grafana_dashboards.png|1200px]]
0bd5db29b29fe49974b4d334b4798d82857f5a9c
465
464
2024-01-31T02:39:27Z
Weiler
3
wikitext
text/x-wiki
We are tracking server and cluster node performance metrics over time via the '''Grafana''' software suite. This is only available in the firewalled/PRISM area. If you want to see past and present performance metrics of a particular server or phoenix cluster node, make sure you are connected to the VPN, then navigate to this website:
http://grafana.prism/dashboards
Then login using the following credentials:
username: guest
password: MoreStats4me
Once logged in, click the small button on the top left of the window with the three small horizontal bars in it, and navigate to the "Dashboards" menu item.
[[File:grafana_menu.png|900px]]
From there, you should be able to see the various sub-folders of the different dashboards for different classes of machines.
[[File:grafana_dashboards.png|1200px]]
2af934a5dfc2d0596e6423616bf12075d09d5cbd
File:Grafana menu.png
6
51
459
2024-01-25T03:02:41Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Grafana dashboards.png
6
52
461
2024-01-25T03:06:02Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Phoenix WDL Tutorial
0
45
466
452
2024-01-31T15:24:57Z
Anovak
4
Fix groups directory path
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
1f296172459b3db121231a9b68912e80d6675add
467
466
2024-01-31T15:28:10Z
Anovak
4
Turn of caching as is demanded
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard error at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stderr.txt:
And
[2023-07-16T16:23:54-0700] [MainThread] [I] [toil.wdl.wdltoil] Standard output at /data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/stdout.txt:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
8241b6981f7e736293becfb599803bd7ca13be02
468
467
2024-02-06T15:09:09Z
Anovak
4
/* Reading the Log */ Update with new examples of the new log logging format.
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
5b18728125ac2936327740c6106f2903ca257792
469
468
2024-02-13T23:01:15Z
Anovak
4
/* Debugging Workflows */ Explain restarting
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an *inputs file*, which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
add1c8435b725cc7bfdf1fc832b7951158bc2554
473
469
2024-03-21T16:23:01Z
Anovak
4
/* Preparing an input file */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to the "head node" of the Phoenix cluster. This node is where everyone logs in, but you should *not* run actual work on this node; it exists only to give you access to the files on the cluster and to the commands to control cluster jobs.
To connect to the head node:
1. Connect to the VPN.
2. SSH to <code>phoenix.prism</code>. At the command line, run:
ssh phoenix.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@phoenix.prism
The first time you connect, you will see a message like:
The authenticity of host 'phoenix.prism (10.50.1.66)' can't be established.
ED25519 key fingerprint is SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>phoenix.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:SUgdBXgsWwUJXxAz/BpGzlGFLOsFtZzeqQ3kzdl3iuI</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
60ab14c68d40978f9c52e7864bd2b116166d3f1e
Slurm Tips for Toil
0
38
470
441
2024-02-16T19:08:32Z
Anovak
4
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/wdl/running.rst the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pip3 install --upgrade toil[wdl]
To use a development version of Toil, you can install from source instead:
pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]
Or for a particular branch:
pip3 install git+https://github.com/DataBiosphere/toil.git@issues/123-abc#egg=toil[wdl]
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
b43add8e50f9203265cde9883be4edcf9b77afa3
Cluster Etiquette
0
47
471
431
2024-03-06T17:24:37Z
Anovak
4
Link to the storage visualization
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jibs that inadvertantly go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them at the same time, you could bring down the file server serving /private/groups. Run only maybe 5 at once in that case, or introduce a random delay at the start of your jobs. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
3: Don't use too much storage. Use http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi to look at how your storage use is divided among your directories, and clean up large chunks of data that you do not need.
8a3f92f2ab671c05374adc8dc82fbfac9d85c741
482
471
2024-05-03T03:24:15Z
Weiler
3
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jobs that inadvertently go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them at the same time, you could bring down the file server serving /private/groups. Run only maybe 5 at once in that case, or introduce a random delay at the start of your jobs. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
3: Don't use too much storage. Use http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi to look at how your storage use is divided among your directories, and clean up large chunks of data that you do not need.
ba2478796c2c7e3eba71f35ff27389503ee58b65
483
482
2024-05-03T03:25:13Z
Weiler
3
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jobs that inadvertently go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them at the same time, you could bring down the /private/groups filesystem. Run only maybe 5 at once in that case, or introduce a random delay at the start of your jobs. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
3: Don't use too much storage. Use http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi to look at how your storage use is divided among your directories, and clean up large chunks of data that you do not need.
543b1e1f5990fe973473a67c430cb5c81be58e7f
AWS Account List and Numbers
0
22
472
424
2024-03-19T18:07:32Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
platform-hca-portal : 158963592881
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
agc-runs : 598929688444
sequencing-center-cold-store : 436140841220
hprc-training : 654654365441
aa43f5df3b8d3a96e4ddf56f11a8172efd2e6994
488
472
2024-05-10T16:16:49Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910
platform-hca-dev : 122796619775
anvil-dev : 608666466534
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903
platform-hca-prod : 542754589326
platform-hca-portal : 158963592881
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
platform-temp-dev : 654654270592
agc-runs : 598929688444
sequencing-center-cold-store : 436140841220
hprc-training : 654654365441
fb98890fecca1981f140200ec27b12a4be3e9228
Firewalled Environment Storage Overview
0
39
475
393
2024-03-27T17:48:47Z
Weiler
3
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 16 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
0cc13680fb8380dc6e8398f5aa7cd4b309526b1c
476
475
2024-03-27T18:00:45Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 500 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
dbcf1a58eedb9940688ab2364a2928469855b5d7
477
476
2024-03-27T18:01:08Z
Weiler
3
/* Storage */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
9108e2840e61e506f31426ae708bf9c3a232de48
491
477
2024-05-13T16:30:25Z
Weiler
3
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== Storage Quota Alerting ==
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email cluster-admin@soe.ucsc.edu with the following information:
1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list
After setup, our alerting system will alert folks on that email list every 4 hours until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a623ff379efd6e0fa4f395832ae6a1d924a64a3b
492
491
2024-05-13T16:31:28Z
Weiler
3
/* Storage Quota Alerting */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== Storage Quota Alerting ==
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email '''cluster-admin@soe.ucsc.edu''' with the following information:
1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list
After setup, our alerting system will alert folks on that email list ''every 4 hours'' until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
4c847065b805be2a55dbd1bcf9548a019b2ab5ed
493
492
2024-05-14T18:21:34Z
Weiler
3
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== Storage Quota Alerting ==
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email '''cluster-admin@soe.ucsc.edu''' with the following information:
1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list
After setup, our alerting system will alert folks on that email list ''every 4 hours'' until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Backups ==
/private/groups is backed up weekly on Friday nights (which usually takes several days to complete). Please not that the following directories in the tree '''WILL NOT''' be backed up:
tmp/
temp/
TMP/
TEMP/
cache/
.cache/
*.tmp/
So if you have data that you know isn't important and should be excluded from the backups, put them in a directory suffixed with ".tmp". Such as this example:
/private/groups/clusteradmin/mybams.tmp/
51a2bc970e10bbc14cf33117a1779b3bd8a0b0e8
494
493
2024-05-14T18:21:55Z
Weiler
3
/* Backups */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== Storage Quota Alerting ==
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email '''cluster-admin@soe.ucsc.edu''' with the following information:
1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list
After setup, our alerting system will alert folks on that email list ''every 4 hours'' until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Backups ==
/private/groups is backed up weekly on Friday nights (which usually takes several days to complete). Please note that the following directories in the tree '''WILL NOT''' be backed up:
tmp/
temp/
TMP/
TEMP/
cache/
.cache/
*.tmp/
So if you have data that you know isn't important and should be excluded from the backups, put them in a directory suffixed with ".tmp". Such as this example:
/private/groups/clusteradmin/mybams.tmp/
c93ec8f5cd28d6fb596538f5456188661e6108ae
495
494
2024-05-14T18:26:05Z
Weiler
3
/* Backups */
wikitext
text/x-wiki
== Storage ==
Our servers mount two types of ''shared'' storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:
'''Filesystem Specifications'''
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Filesystem<br />
! /private/home
! /private/groups
|-
| style="font-weight:bold; text-align:left;" | Default Soft Quota
| 30 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 15 TB
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 800 TB
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
| style="font-weight:bold;" | Intended Use
| This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here.
| This space should be used for large computational/shared data, large software installations and the like.
|}
'''Home Directories (/private/home/username)'''
Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.
'''Groups Directories (/private/groups/groupname)'''
The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"
That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.
To check the max quota limit, use this command:
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
== Storage Quota Alerting ==
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email '''cluster-admin@soe.ucsc.edu''' with the following information:
1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list
After setup, our alerting system will alert folks on that email list ''every 4 hours'' until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
== Backups ==
/private/groups is backed up weekly on Friday nights (which usually takes several days to complete). Please note that the following directories in the tree '''WILL NOT''' be backed up:
tmp/
temp/
TMP/
TEMP/
cache/
.cache/
scratch/
*.tmp/
So if you have data that you know isn't important and should be excluded from the backups, put them in a directory suffixed with ".tmp". Such as this example:
/private/groups/clusteradmin/mybams.tmp/
76042b97e90c0bab08f9475345abb1c991e0d82b
Visual Studio Code (vscode) Configuration Tweaks
0
53
479
2024-04-26T18:11:45Z
Weiler
3
Created page with "Visual Studio Code (vscode) is a popular IDE used for writing code, and many people use the remote functionality feature to edit code on a remote server, which is very cool. But the issue with vscode is that it frequently opens way to many files on the remote server in an attempt to cache search databases and code modifications, and that unnecessarily puts a large burden on the remote server kernel for caching filehandles an such, slow down the remote filesystem and se..."
wikitext
text/x-wiki
Visual Studio Code (vscode) is a popular IDE used for writing code, and many people use the remote functionality feature to edit code on a remote server, which is very cool. But the issue with vscode is that it frequently opens way to many files on the remote server in an attempt to cache search databases and code modifications, and that unnecessarily puts a large burden on the remote server kernel for caching filehandles an such, slow down the remote filesystem and server you are working on.
The fix seems to be to edit this file on the remote server:
~/.vscode-server/data/Machine/settings.json
Create that file if it does not already exist. Then put the following text in it:
{
"search.exclude": {
"**/node_modules": true,
"**/bower_components": true,
"**/env": true,
"**/venv": true
},
"files.watcherExclude": {
"**/.git/objects/**": true,
"**/.git/subtree-cache/**": true,
"**/node_modules/*/**": true,
"**/.cache/**": true,
"**/.conda/**": true,
"**/.local/**": true,
"**/.nextflow/**": true,
"**/env/**": true,
"**/venv/**": true,
"**/work/**": true
}
}
Then close vscode. The most reliable is to click on the bottom left where it usually says "SSH: mustard.prism" (or whatever server you are connected to). That populates the dropdown from the search bar, and at the bottom there is an action: "Close Remote Connection". Otherwise it seems to reconnect, stay connected in the background sometimes. Then save any open files you have in vscode and close vscode on your laptop or workstation, then re-open it.
01eb56b337e0a67d8ca5e3ec11d8c8fc9c62143c
480
479
2024-04-26T18:12:06Z
Weiler
3
wikitext
text/x-wiki
Visual Studio Code (vscode) is a popular IDE used for writing code, and many people use the remote functionality feature to edit code on a remote server, which is very cool. But the issue with vscode is that it frequently opens way to many files on the remote server in an attempt to cache search databases and code modifications, and that unnecessarily puts a large burden on the remote server kernel for caching filehandles an such, slow down the remote filesystem and server you are working on.
The fix seems to be to edit this file on the remote server:
~/.vscode-server/data/Machine/settings.json
Create that file if it does not already exist. Then put the following text in it:
{
"search.exclude": {
"**/node_modules": true,
"**/bower_components": true,
"**/env": true,
"**/venv": true
},
"files.watcherExclude": {
"**/.git/objects/**": true,
"**/.git/subtree-cache/**": true,
"**/node_modules/*/**": true,
"**/.cache/**": true,
"**/.conda/**": true,
"**/.local/**": true,
"**/.nextflow/**": true,
"**/env/**": true,
"**/venv/**": true,
"**/work/**": true
}
}
Then close vscode. The most reliable is to click on the bottom left where it usually says "SSH: mustard.prism" (or whatever server you are connected to). That populates the dropdown from the search bar, and at the bottom there is an action: "Close Remote Connection". Otherwise it seems to reconnect, stay connected in the background sometimes. Then save any open files you have in vscode and close vscode on your laptop or workstation, then re-open it.
9822bbccbe35f9741d55a895ae3ef3473d3ffa64
481
480
2024-04-27T15:09:03Z
Weiler
3
wikitext
text/x-wiki
Visual Studio Code (vscode) is a popular IDE used for writing code, and many people use the remote functionality feature to edit code on a remote server, which is very cool. But the issue with vscode is that it frequently opens way to many files on the remote server in an attempt to cache search databases and code modifications, and that unnecessarily puts a large burden on the remote server kernel for caching filehandles an such, slow down the remote filesystem and server you are working on.
The fix seems to be to edit this file on the remote server:
~/.vscode-server/data/Machine/settings.json
Create that file if it does not already exist. Then put the following text in it:
{
"search.exclude": {
"**/node_modules": true,
"**/bower_components": true,
"**/env": true,
"**/venv": true
},
"files.watcherExclude": {
"**/.git/objects/**": true,
"**/.git/subtree-cache/**": true,
"**/node_modules/*/**": true,
"**/.cache/**": true,
"**/.conda/**": true,
"**/.local/**": true,
"**/.nextflow/**": true,
"**/env/**": true,
"**/venv/**": true,
"**/work/**": true,
"**/private/groups/**": true
}
}
Then close vscode. The most reliable is to click on the bottom left where it usually says "SSH: mustard.prism" (or whatever server you are connected to). That populates the dropdown from the search bar, and at the bottom there is an action: "Close Remote Connection". Otherwise it seems to reconnect, stay connected in the background sometimes. Then save any open files you have in vscode and close vscode on your laptop or workstation, then re-open it.
437bc07a69e463620143ac7249a7501ba32e7db8
Firewalled Computing Resources Overview
0
41
484
451
2024-05-07T17:36:11Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. Although you do not need to be logged into phoenix.prism to submit Slurm jobs, jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
61a85b3b432a3ef7364b05bbd1c4f86594c5cc41
490
484
2024-05-11T15:08:21Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 128
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
2f0c5689b2fc019cf4203e3a541e983118a6dff7
498
490
2024-06-10T18:21:06Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 128
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
c904979eb1d7363210d6ec9cf8b6ba601ae5a6a0
499
498
2024-06-10T18:21:25Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 128 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
a55b470229379015ba5c96a8b000078a65729060
500
499
2024-06-10T18:22:00Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 256 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp. That area is cleaned often so don't store any data there that isn't being used by your jobs.
1c1c418cf200966df761786b94ea9436a5a9fd2b
501
500
2024-06-10T18:22:52Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of ~22 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 256 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp (which is local to each cluster node). That area is cleaned often so don't store any data there that isn't being used by your jobs.
c33beea9778fc5df5aba7888010da0710b5fa899
502
501
2024-06-10T18:23:11Z
Weiler
3
/* The Phoenix Cluster */
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of 25 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 256 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp (which is local to each cluster node). That area is cleaned often so don't store any data there that isn't being used by your jobs.
7087543fbdc2b5689562019ebe7fc55ed9f89888
Firewalled User Account and Storage Cost
0
43
485
367
2024-05-07T17:38:14Z
Weiler
3
wikitext
text/x-wiki
== Account and Storage Cost ==
Costs for having an active UNIX accounts is listed in this document:
https://sites.google.com/view/ucscgenomicsinstitute/finance/recharge-services-rates?authuser=0
As of the writing of this document, it looks like this:
{| class="wikitable"
|- style="font-weight:bold; text-align:center;"
! Service
! Cost
|-
| UNIX User Account per Month
| style="text-align:center;" | $35.34
|-
| OpenStack User Account per Month
| style="text-align:center;" | $35.34
|-
| TB of Storage per Month
| style="text-align:center;" | Currently free, but may change one day
|}
The sponsor of each user and owner of each /private/groups/labname area provides a FOAPAL to our finance group to cover the monthly cost of these resources.
6c65e66b0d118321792926cc4b39ac807a533318
Requirements for dbGaP Access
0
19
486
235
2024-05-07T17:53:44Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Haifang Telc (haifang@ucsc.edu) know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''cluster-admin@soe.ucsc.edu)''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to haifang@ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
4608d9b477742ec0348020de6c23ad757dae02ba
487
486
2024-05-07T17:53:56Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let Haifang Telc (haifang@ucsc.edu) know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''cluster-admin@soe.ucsc.edu''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to haifang@ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
45444eadaf1f22ed6b157d079f3200d84403b9c1
Overview of using Slurm
0
32
489
445
2024-05-11T15:06:33Z
Weiler
3
wikitext
text/x-wiki
When using Slurm, you will need to log into one of the interactive compute servers in the PRISM area (such as emerald, mustard, crimson or razzmatazz). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=short
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each CPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find our how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
d9c28dd6110a4fa3ec15538c1abd2146eea69a76
496
489
2024-05-17T21:00:35Z
Weiler
3
/* CGROUPS and Resource Management */
wikitext
text/x-wiki
When using Slurm, you will need to log into one of the interactive compute servers in the PRISM area (such as emerald, mustard, crimson or razzmatazz). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=short
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each CPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find out how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
6fff83983ab62bc31b3fde21c4dbe0b0d362988c
497
496
2024-05-17T21:01:33Z
Weiler
3
/* Submit a Slurm Batch Job */
wikitext
text/x-wiki
When using Slurm, you will need to log into one of the interactive compute servers in the PRISM area (such as emerald, mustard, crimson or razzmatazz). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
You might also want to consult the [[Quick Reference Guide]].
== Submit a Slurm Batch Job ==
In order to submit a Slurm batch job list, you will need to create a directory that you will have both read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir -p /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /private/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Partition - This is the queue it goes in:
#SBATCH --partition=short
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each CPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "--gres=gpu:[1-8]", or "--gres=gpu:A5500:[1-8]" with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
echo "Running test script on a single CPU core"
sleep 5
echo "Test done!"
date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh
Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.
== Launching Several Jobs at Once ==
You can launch many jobs at once using the $SLURM_ARRAY_TASK_ID variable. Add something like the following to your batch submission file:
#SBATCH --array=0-31
#SBATCH --output=array_job_%A_task_%a.out
#SBATCH --error=array_job_%A_task_%a.err
## Command(s) to run:
echo "I am task $SLURM_ARRAY_TASK_ID"
== CGROUPS and Resource Management ==
Our installation of Slurm will utilize Linux CGROUPS, which puts a hard resource cap on jobs. If you define that your job will need 4GB of RAM, and it uses 5GB, it will fail with an OOM exception. Likewise with CPU or GPU resources; if your job ends up using more than you specify, it will fail. Or the "--time" batch file option, your job will fail if it takes longer than what you specify there. This is to keep the nodes from crashing from runaway jobs that use more resources than you think they will.
So... TEST YOUR JOBS! Find out how much in resources a single job needs before you launch 100 of them.
== TEST YOUR JOBS! ==
Let me say that one more time. Test your jobs before launching a bunch of them! If it fails, you don't want it to fail 100 or more times. You can also get a good idea of how much RAM and CPU it will need so you can better define your batch files. It's critical to get a good idea of how many resources each of your jobs will use and define your job file appropriately.
7f6cbeb5122b5e03ffbb35d35e9b7e423965e79e
GPU Resources
0
36
503
432
2024-06-28T16:32:15Z
Anovak
4
Add partition and time
wikitext
text/x-wiki
When submitting jobs, you can ask for GPUs in one of two ways. One is:
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
That will ask for 1 GPU generically on a node with a free GPU. This request is more specific:
#SBATCH --partition=gpu
#SBATCH --gres=gpu:A5500:3
That requests 3 A5500 GPUs '''only'''.
We have several GPU types on the cluster which may fit your specific needs:
nVidia RTX A5500 : 24GB RAM
nVidia A100 : 80GB RAM
For the most part, Slurm takes care of making sure that each job only sees and used the GPUs assigned to it. Within the job, '''CUDA_VISIBLE_DEVICES''' will be set in the environment, but it will always be set to a list of your requested number of GPUs, starting at 0. Slurm re-numbers the GPUs assigned to each job to appear to start at 0, within the job. If you need access to the "real" GPU numbers (to log or to pass along to Docker), they are available in the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLURM_STEP_GPUS''' (for '''srun''') environment variable.
==Running GPU Workloads==
To actually use an nVidia GPU, you need to run a program that uses the CUDA API. There are a few ways to obtain such a program.
===Prebuilt CUDA Applications===
The Slurm cluster nodes have the nVidia drivers installed, as well as basic CUDA tools like nvidia-smi.
Some projects, such as tensorflow, may ship pre-built binaries that can use CUDA. You should be able to run these binaries directly, if you download them.
===Building CUDA Applications===
The cluster nodes do not have the full CUDA Toolkit. In particular, they do not have the '''nvcc''' CUDA-enabled compiler. If you want to compile applications that use CUDA, you will need to install the development environment yourself for your user.
Once you have '''nvcc''' available to your user, building CUDA applications should work. To run them, you will have to submit them as jobs, because the head node does not have a GPU.
===Containerized GPU Workloads===
Instead of directly installing binaries, or installing and using the CUDA Toolkit, it is often easiest to use containers to download a prebuilt GPU workload and all of its libraries and dependencies. THere are a few options for running containerized GPU workloads on the cluster.
====Running Containers in Singularity====
You can run containers on the cluster using Singularity, and give them access to the GPUs that Slurm has selected using the '''--nv''' option. For example:
singularity pull docker://tensorflow/tensorflow:latest-gpu
srun -c 8 --mem 10G --partition=gpu --time=00:20:00 --gres=gpu:1 singularity run --nv docker://tensorflow/tensorflow:latest-gpu python -c 'from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())'
This will produce output showing that the Tensorflow container is indeed able to talk to one GPU:
INFO: Using cached SIF image
2023-05-15 11:36:33.110850: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-15 11:36:38.799035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /device:GPU:0 with 22244 MB memory: -> device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8527638019084870106
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23324655616
locality {
bus_id: 1
links {
}
}
incarnation: 1860154623440434360
physical_device_desc: "device: 0, name: NVIDIA RTX A5500, pci bus id: 0000:03:00.0, compute capability: 8.6"
xla_global_id: 416903419
]
Slurm's containment of the Slurm job to the correct set of GPUs is also passed through to the Singularity container; there is no need to specifically direct Singularity to use the right GPUs unless you are doing something unusual.
====Running Containers in Slurm====
Slurm itself also supports a '''--container''' option for jobs, which allows a whole job to be run inside a container. If you are able to [https://slurm.schedmd.com/containers.html convert your container to OCI Bundle format], you can pass it directly to Slurm instead of using Singularity from inside the job. However, Docker-compatible image specifiers can't be given to Slurm, only paths to OCI bundles on disk.
Stnad-alone tools to download a Docker image from Docker Hub in OCI bundle format ('''skopeo''' and '''umoci''') are not yet installed on the cluster. But the method using the '''docker''' command should work.
Slurm containers ''should'' have access to their assigned GPUs, but it is not clear if tools like '''nvidia-smi''' are injected into the container, as they would be with Singularity or the nVidia Container Runtime.
====Running Containers in Docker====
You might be used to running containers with Docker, or containerized GPU workloads with the nVidia Container Runtime or Toolkit. Docker is installed on all the nodes and the daemon is running; if the '''docker''' command does not work for you, ask cluster-admin to add you to the right groups.
The '''nvidia''' runtime is set up and will automatically be used.
While Slurm configures each Slurm job with a cgroup that directs it to the correct GPUs, '''using Docker to run another container escapes Slurm's confinement''', and using '''--gpus=1''' will ''always'' use the ''first'' GPU in the system, whether that GPU is assigned to your job or not. When using Docker, you ''must'' consult the '''SLURM_JOB_GPUS''' (for '''sbatch''') or '''SLRUM_STEP_GPUS''' (for '''srun''') environment variable and pass that along to your container. You should also impose limits on all other resources used by your Docker container, so that your whole job stays within the resources allocated by Slurm's scheduler. (TODO: find out how cgroups handles oversubscription between a Docker container and the Slurm container that launched it).
An example of a working command is:
srun -c 1 --mem 4G --partition=gpu --time=00:20:00 --gres=gpu:2 bash -c 'docker run --rm --gpus=\"device=$SLURM_STEP_GPUS\" nvidia/cuda:12.1.1-base-ubuntu22.04 nvidia-smi'
Note that the double-quotes are included in the argument to '''--gpus''' as seen by the Docker client, and that '''bash''' and single-quotes are used to ensure that '''$SLURM_STEP_GPUS''' is evaluated within the job itself, and not on the head node.
3a2677bb8c2ccb24541409ed3a88e8238ab1d970
Slurm Queues (Partitions) and Resource Management
0
48
504
474
2024-06-29T16:02:10Z
Weiler
3
wikitext
text/x-wiki
== Partitions ==
Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.
Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.
{| class="wikitable"
|- style="font-weight:bold;"
! Partition Name
! Default Walltime Limit
! Maximum Walltime Limit
! style="border-color:inherit;" | Default Partition?
! Job Priority
! Maximum Nodes Utilized
|-
| short
| 10 minutes
| 1 hour
| style="border-color:inherit;" | Yes
| Normal
| All
|-
| medium
| 1 hour
| 12 hours
| style="border-color:inherit;" | No
| Normal
| 15
|-
| long
| 12 hours
| 14 days
| style="border-color:inherit;" | No
| Normal
| 10
|-
| high_priority
| 10 minutes
| 7 days
| style="border-color:inherit;" | No
| High
| All<br />
|-
| gpu
| 10 minutes
| 7 days
| No
| Normal
| 6
|}
If you do not specify a partition to run your job in (with e.g. <code>--partition=medium</code>), it will automatically be assigned the "short" partition by default. If you do not specify a walltime value in your job submission script (with e.g. <code>--time=00:30:00</code>), it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.
This all means that it is very important to '''TEST''' your jobs before running many of them! Submit one job and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).
You can test your jobs by running one job via '''srun''' with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).
'''Example'''
seff 769059
'''Output'''
Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB
So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.
== '''high_priority''' Partition Notes ==
The "high_priority" partition is special in that it will have the highest priority of all jobs on the cluster and will push all other jobs aside in an effort to finish jobs in that partition as fast as possible. This is only available for emergency or mission critical batches that need to be completed in an unexpectedly critically fast way. Access to this partition is only granted on a per request basis, and is temporary until your batch finishes. Email '''cluster-admin@soe.ucsc.edu''' if you need to access the high_priority queue and make your case why it is necessary.
== My job is not running but I want it to be running ==
Even if your job is in the high-priority partition, that doesn't mean that the cluster will drop everything and run it immediately. Because we don't have pre-emption set up, high priority jobs still have to wait for currently-running jobs to finish, as well as for other high-priority jobs. And since, as noted above, jobs can be allowed to run for up to 7 days each, it is physically possible for even the highest-priority job in the whole cluster to not start for a whole week.
Here is a [https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/why-job-not-run/ good resource from Berkeley] about understanding and debugging Slurm job scheduling. Basically, Slurm uses the wall-clock limits of running jobs, and of jobs in the queue, to make a plan to start each job on some node at some time in the future. If jobs finish early, other jobs can start sooner than scheduled, and if there is space around higher-priority jobs, lower-priority jobs can be filled in.
If you want to know when Slurm plans to run your job, and why that is not right now, you can use the <code>--start</code> option for the <code>squeue</code> command:
$ squeue -j 1719584 --start
JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON)
1719584 short snakemak flastnam PD 2024-01-22T10:20:00 1 phoenix-00 (Priority)
The <code>START_TIME</code> column is the time by which Slurm is sure it will be able to start your job if no higher-priority jobs come in first, and the <code>NODELIST(REASON)</code> column shows the nodes the job is running on, or the reason it is not running now, in parentheses. In this case, the job is not running because higher-priority jobs are in the way.
c608da236cd130864be84897fbaa66b9a3dd1a6c
Phoenix WDL Tutorial
0
45
505
473
2024-07-16T20:06:55Z
Anovak
4
/* Connecting to Phoenix */ Use emerald login node
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this. In the log of your failing Toil taks, look for likes like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
35434cff814f177e1af3a0d81debb3b25782c104
506
505
2024-07-16T20:16:39Z
Anovak
4
/* Reproducing Problems */ Explain new debug-job features
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
171095d21b0be8fa6b96e9d38687a3fdf4fd7083
507
506
2024-07-16T20:20:12Z
Anovak
4
/* Reading the Log */ Explain --writeLogs
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
This happens because Slurm is providing Toil with an <code>XDG_RUNTIME_DIR</code> environment variable that points to a directory that doesn't exist, which the XDG spec says it shouldn't be doing. This is a known bug in the GI Slurm configuration, and Toil is letting you know that it is working around it.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b3a668ea329b023cf051b45a0a5aac1cc54555df
508
507
2024-07-16T20:24:49Z
Anovak
4
/* Frequently Asked Questions */ Note there's no more XDG_RUNTIME_DIR warning.
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we can't keep these in your home directory. We will need to use the <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code> directory you created earlier.
Make sure that that directory is available in your <code>~/.bashrc</code> file by editing and running this command:
echo 'BIG_DATA_DIR=/private/groups/YOURGROUPNAME/YOURUSERNAME' >>~/.bashrc
Then use these commands to make sure that Toil knows where it ought to put its caches:
echo 'export SINGULARITY_CACHEDIR="${BIG_DATA_DIR}/.singularity/cache"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="${BIG_DATA_DIR}/.cache/miniwdl"' >>~/.bashrc
After that, '''log out and log back in again''', to apply the changes.
If you don't do this, Toil will re-download each container image, on each node, for each run of each workflow. That wastes a lot of time, and can exhaust the [https://docs.docker.com/docker-hub/download-rate-limit/#whats-the-download-rate-limit-on-docker-hub limits on how many containers you are allowed to download each day].
'''If you get errors about mutexes, lock files, or other weird problems with Singularity''', try moving these directories to inside <code>/data/tmp</code> on the individual nodes, or unsetting them and letting Toil use its defaults (and exhaust our Docker pull limits). [https://github.com/DataBiosphere/toil/issues/4654 It is not clear that <code>/private/groups</code> actually implements the necessary file locking correctly.]
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
7a50be257ba669ff75b169df112d24616e5be415
509
508
2024-07-16T20:40:10Z
Anovak
4
/* Configuring Toil for Phoenix */ Provide Julian's preferred storage paths, note Ceph bug and default home directory image storage.
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows. When installing, you need to specify that you want WDL support. To do this, you can run:
pip install --upgrade --user 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pip install --upgrade --user 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>.
By default, the command interpreter *will not* look there, so if you type <code>toil-wdl-runner</code>, it will complain that the command is not found. To fix this, you need to configure the command interpreter (bash) to look where Toil is installed. To do this, run:
echo 'export PATH="${HOME}/.local/bin:${PATH}"' >>~/.bashrc
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
c577a968ce491e06c778d3aa09a2e1b9d328e69a
Genomics Institute Computing Information
0
6
510
478
2024-07-26T15:44:48Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
[[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi|'''/private/groups''' Data Usage Graphs]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
73f4e9e4e801415bc20509936aa76a45b9113213
511
510
2024-07-26T15:45:05Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi|'''/private/groups''' Data Usage Graphs]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
c6bef10fdccbd35c1e6e89737614f6ec7eedbbe1
512
511
2024-07-26T15:45:16Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi|'''/private/groups''' Data Usage Graphs]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
d08a965e06ef7424c2d3f35e2f7b77b5065f8ba2
513
512
2024-07-26T15:45:51Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
8088c00396e384cf1f6571389e89133104feaed9
518
513
2024-09-28T16:35:51Z
Weiler
3
/* Slurm at the Genomics Institute */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
8489b79202db3e917d07185ee20a98a6c8ddf72e
542
518
2025-01-19T22:27:19Z
Weiler
3
/* GI Firewalled Computing Environment (PRISM) */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
30b668a112aeb11f8370cf18ccc4fe760344ec05
Cluster Etiquette
0
47
514
483
2024-08-02T17:28:23Z
Weiler
3
wikitext
text/x-wiki
When running jobs on the cluster, you must be very aware of how those jobs will affect other users.
1: Always test your job by running one first. Just one. Note how much RAM, how many CPU cores and how much time it takes to run. Then, when you submit 50 or 100 of those, you can specify limits in your Slurm batch file on how long the job should run, how much RAM it should use and how much time it takes. In that case, slurm can stop jobs that inadvertently go too long or use too many resources.
2: Don't run too many jobs at once if they use a lot of disk I/O. If every job reads in a 100GB file, and you launch 20 of them at the same time, you could bring down the /private/groups filesystem. Run only maybe 5 at once in that case, or introduce a random delay at the start of your jobs. You can limit your concurrent jobs by specifying something like this in your job batch file:
#SBATCH --array=[1-279]%10
inputList=$1
input=$(sed -n "$SLURM_ARRAY_TASK_ID"p $inputList)
some_command $input
3: Please do not pin cluster resources with interactive jobs and let them sit idle. Sometimes folks will open an interactive cluster job with a week long runtime and just let it sit in order to "hold" a spot in the queue for when they might eventually want to run something. This is a waste of resources, and it also forces other who have work ready to go to wait in the queue while nodes sit idle. If you use an interactive job via '''srun''' or '''salloc''', please start it immediately upon launch and close it immediately upon the job's completion.
4: Don't use too much storage. Use http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi to look at how your storage use is divided among your directories, and clean up large chunks of data that you do not need.
fffe83f529aad9fcb81d089f4bb91b88746fb9eb
File:Ucsc gi private diagram.png
6
54
515
2024-09-25T16:29:08Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Firewalled Computing Resources Overview
0
41
516
502
2024-09-25T16:31:07Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of 25 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 256 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp (which is local to each cluster node). That area is cleaned often so don't store any data there that isn't being used by your jobs.
==Graphical Diagram of the Firewalled Area==
This is a general representation of how things look:
[[File:grafana_menu.png|900px]]
a46ad6b35ffe7f810d57a72ad20b9ef83bd374b6
517
516
2024-09-25T16:31:50Z
Weiler
3
wikitext
text/x-wiki
== Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill
your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Server Types and Management ==
After confirming your VPN software is working, you can ssh into one of the shared compute servers behind the VPN. The DNS suffix for all machines is ".prism". So, "mustard" would have a full DNS name of "mustard.prism":
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; text-align:left;"
! Node Name
! Operating System<br />
! CPU Cores
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | mustard
| style="text-align:left;" | Ubuntu 22.04
| 160
| 1.5 TB
| 10 Gb/s
| 9 TB
|-
| style="text-align:left;" | emerald
| style="text-align:left;" | Ubuntu 22.04
| 64
| 1 TB
| 10 Gb/s
| 690 GB
|-
| style="text-align:left;" | crimson
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|-
| style="text-align:left;" | razzmatazz
| style="text-align:left;" | Ubuntu 22.04
| 32
| 256 GB
| 10 Gb/s
| 5.5 TB
|}
These ''shared'' servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on any of these servers, please make your request by emailing cluster-admin@soe.ucsc.edu.
== The Firewall ==
All servers are behind a firewall in this environment, and as such, you must connect to the VPN in order to access them. They will not be accessible from the greater Internet without VPN. Although you will be able to connect outbound from them to other servers on the internet to copy data in, sync git repos, stuff like that. It is only inbound connections that will be blocked. All machines behind the firewall have the private domain name suffix of "*.prism".
== The Phoenix Cluster ==
This is a cluster of 25 Ubuntu 22.04 nodes, some of which have GPUs in them. Each node generally has about 2TB RAM and 256 cores, although the cluster is heterogeneous and has multiple node types. You interact with the Phoenix Cluster via the Slurm Job Scheduler. You must specifically request access to use Slurm on the Phoenix Cluster, just email '''cluster-admin@soe.ucsc.edu''' for access.
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
! Node Name
! Operating System<br />
! CPU Cores
! GPUs/Type
! Memory
! Network Bandwidth
! Scratch Space
|-
| style="text-align:left;" | phoenix-00
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A100
| 1 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[01-05]
| style="text-align:left;" | Ubuntu 22.04
| 256
| 8 / Nvidia A5500
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[06-08]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[09-10]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[11-21]
| style="text-align:left;" | Ubuntu 22.04
| 256
| N/A
| 2 TB
| 10 Gb/s
| 16 TB NVMe
|-
| style="text-align:left;" | phoenix-[22-24]
| style="text-align:left;" | Ubuntu 22.04
| 384
| N/A
| 2.3 TB
| 10 Gb/s
| 16 TB NVMe
|}
The cluster head node is '''phoenix.prism'''. However, you cannot directly login to phoenix.prism in order to protect the scheduler from errant or runaway jobs there, so jobs can be submitted from any interactive compute server (mustard, emerald, razzmatazz or crimson). To learn more about how to use Slurm, refer to:
https://giwiki.gi.ucsc.edu/index.php/Genomics_Institute_Computing_Information#Slurm_at_the_Genomics_Institute
For scratch on the cluster, TMPDIR will be set to /data/tmp (which is local to each cluster node). That area is cleaned often so don't store any data there that isn't being used by your jobs.
==Graphical Diagram of the Firewalled Area==
This is a general representation of how things look:
[[File:Ucsc_gi_private_diagram.png|900px]]
c36570b83bd42e277f37231cff0cb9b45f8542c6
Convenient Slurm Commands
0
55
519
2024-09-28T16:40:31Z
Weiler
3
Created page with "__TOC__ ==General commands== Get documentation on a command: man <command> Try the following commands: man sbatch man squeue man scancel ==Submitting Jobs== The following example script specifies a partition, time limit, memory allocation and number of cores. All your scripts should specify values for these four parameters. You can also set additional parameters as shown, such as jobname and output file. For This script performs a simple task — it generates of..."
wikitext
text/x-wiki
__TOC__
==General commands==
Get documentation on a command:
man <command>
Try the following commands:
man sbatch
man squeue
man scancel
==Submitting Jobs==
The following example script specifies a partition, time limit, memory allocation and number of cores. All your scripts should specify values for these four parameters. You can also set additional parameters as shown, such as jobname and output file. For This script performs a simple task — it generates of file of random numbers and then sorts it. A detailed explanation the script is available here.
#!/bin/bash
#
#SBATCH -p shared # partition (queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt
Now you can submit your job with the command:
sbatch myscript.sh
If you want to test your job and find out when your job is estimated to run use (note this does not actually submit the job):
sbatch --test-only myscript.sh
==Information On Jobs==
List all current jobs for a user:
squeue -u <username>
List all running jobs for a user:
squeue -u <username> -t RUNNING
List all pending jobs for a user:
squeue -u <username> -t PENDING
List priority order of jobs for the current user (you) in a given partition:
showq-slurm -o -u -q <partition>
List jobs run by the current user since a certain date:
sacct --starttime <YYYY-MM-DD>
List jobs run by a user during an interval marked by a start, -S, and an end, -E, date along with the information on the job id, the allocated node, partition, number of allocated CPUs, state of the job, and the start time of the job:
sacct -S <YYYY-MM-DD> -E <YYYY-MM-DD> -u <username> --format=JobID,nodelist,Partition,AllocCPUs,State,start
If the end date is left out, then the sacct command will list the jobs starting from the start date until now.
List detailed information for a currently running job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
To view the command line argument at the time of submission of a job:
sacct -j <jobid> -o submitline -P
To see the batch script of a submitted job:
sacct -j <jobid> --batch
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on both completed jobs and currently running jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed,nodelist -X
To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
==Controlling jobs==
To cancel one job:
scancel <jobid>
To cancel all the jobs for a user:
scancel -u <username>
To cancel all the pending jobs for a user:
scancel -t PENDING -u <username>
To cancel one or more jobs by name:
scancel --name myJobName
To hold a particular job from being scheduled:
scontrol hold <jobid>
To release a particular job to be scheduled:
scontrol release <jobid>
To requeue (cancel and rerun) a particular job:
scontrol requeue <jobid>
==Job Arrays and Useful Commands==
As shown in the commands above, its easy to refer to one job by its Job ID, or to all your jobs via your username. What if you want to refer to a subset of your jobs? The answer is to submit your job set as a job array. Then you can use the job array ID to refer to the set when running SLURM commands.
SLURM job arrays
To cancel an indexed job in a job array:
scancel <jobid>_<index>
e.g.
scancel 1234_4
To find the original submit time for your job array
sacct -j 32532756 -o submit -X --noheader | uniq
==Advanced (but useful!) Commands==
The following commands work for individual jobs and for job arrays, and allow easy manipulation of large numbers of jobs. You can combine these commands with the parameters shown above to provide great flexibility and precision in job control. (Note that all of these commands are entered on one line)
Suspend all running jobs for a user (takes into account job arrays):
squeue -ho %A -t R | xargs -n 1 scontrol suspend
Resume all suspended jobs for a user:
squeue -o "%.18A %.18t" -u <username> | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume
After resuming, check if any are still suspended:
squeue -ho %A -u $USER -t S | wc -l
3c945f8d1b0193e02f3df778179d8f85c4af1fdf
Slurm Tips for vg
0
37
520
455
2024-10-11T17:05:25Z
Anovak
4
/* Misc Tips */ Discourage interactive shells
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to the cluster head node:
ssh phoenix.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t ed25519 && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G --time=00:30:00 make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* For a lightweight job that outputs to your terminal or that can be waited for in a Bash script, run an individual command directly from <code>srun</code>:
srun -c1 --mem 2G --partition short --time 1:00:00 sleep 10
* If you need to run a few commands in the same shell, use <code>sbatch --wrap</code>:
sbatch -c1 --mem 2G --partition short --time 1:00:00 --wrap ". venv/bin/activate; ./script1.py && ./script2.py"
* To watch a batch job's output live, look at the <code>Submitted batch job 5244464</code> line from <code>sbatch</code> and run:
tail -f slurm-5244464.out
* '''Danger!''' If you ''really'' need an interactive session with appreciable resources, you can schedule one with <code>srun --pty</code>. But it is '''very easy''' to waste resources like this, since the job will happily sit there not doing anything until it hits the timeout. Only do this for testing! For real work, use one of the other methods!
srun -c 16 --mem 120G --time=08:00:00 --partition=medium --pty bash -i
* To send out a job without making a script file for it, use '''sbatch --wrap "your command here"'''.
* You can use arguments from SBATCH lines on the command line!
* You can use [https://github.com/CLIP-HPC/SlurmCommander#readme Slurm Commander] to watch the state of the cluster with the '''scom''' command.
cf2daaa74f40e9df314286dd7b7fd5cae5a49442
540
520
2025-01-16T16:58:39Z
Anovak
4
/* Setting Up */
wikitext
text/x-wiki
This page explains how to set up a development environment for [https://github.com/vgteam/vg vg] on the Phoenix cluster.
==Setting Up==
1. After connecting to the VPN, connect to an interactive node:
ssh razzmatazz.prism
This node is relatively small, so you shouldn't run real work on it, but it is the place you need to be to submit Slurm jobs.
2. Make yourself a user directory under '''/private/groups''', which is where large data must be stored. For example, if you are in the Paten lab:
mkdir /private/groups/patenlab/$USER
3. (Optional) Link it over to your home directory, so it is easy to use storage there to store your repos. The '''/private/groups''' storage may be faster than the home directory storage.
mkdir -p /private/groups/patenlab/$USER/workspace
ln -s /private/groups/patenlab/$USER/workspace ~/workspace
4. Make sure you have SSH keys created and add them to Github.
cat ~/.ssh/id_ed25519.pub || (ssh-keygen -t ed25519 && cat ~/.ssh/id_ed25519.pub)
# Paste into https://github.com/settings/ssh/new
5. Make a place to put your clone, and clone vg:
mkdir -p ~/workspace
cd ~/workspace
git clone --recursive git@github.com:vgteam/vg.git
cd vg
6. vg's dependencies should already be installed on the cluster nodes. If any of them seem to be missing, tell cluster-admin@soe.ucsc.edu to install them.
7. Build vg as a Slurm job. This will send the build out to the cluster as a 64-core, 80G memory job, and keep the output logs in your terminal.
srun -c 64 --mem=80G --time=00:30:00 make -j64
This will leave your vg binary at '''~/workspace/vg/bin/vg'''.
==Misc Tips==
* For a lightweight job that outputs to your terminal or that can be waited for in a Bash script, run an individual command directly from <code>srun</code>:
srun -c1 --mem 2G --partition short --time 1:00:00 sleep 10
* If you need to run a few commands in the same shell, use <code>sbatch --wrap</code>:
sbatch -c1 --mem 2G --partition short --time 1:00:00 --wrap ". venv/bin/activate; ./script1.py && ./script2.py"
* To watch a batch job's output live, look at the <code>Submitted batch job 5244464</code> line from <code>sbatch</code> and run:
tail -f slurm-5244464.out
* '''Danger!''' If you ''really'' need an interactive session with appreciable resources, you can schedule one with <code>srun --pty</code>. But it is '''very easy''' to waste resources like this, since the job will happily sit there not doing anything until it hits the timeout. Only do this for testing! For real work, use one of the other methods!
srun -c 16 --mem 120G --time=08:00:00 --partition=medium --pty bash -i
* To send out a job without making a script file for it, use '''sbatch --wrap "your command here"'''.
* You can use arguments from SBATCH lines on the command line!
* You can use [https://github.com/CLIP-HPC/SlurmCommander#readme Slurm Commander] to watch the state of the cluster with the '''scom''' command.
e2c35ab93bae6e4cdda04a0dbe7e53ffd387267c
AWS Account List and Numbers
0
22
521
488
2024-10-11T20:24:54Z
Weiler
3
wikitext
text/x-wiki
This is a list of our currently available AWS accounts and their account numbers:
ucsc-bd2k : 862902209576
ucsc-toil-dev : 318423852362
ucsc-vg-dev : 781907127277
ucsc-platform-dev : 719818754276
comparative-genomics-dev : 162786355865
nanopore-dev : 270442831226
ucsc-cgp-production : 097093801910 (Removed)
platform-hca-dev : 122796619775
anvil-dev : 608666466534 (Removed)
gi-gateway : 652235167018
pangenomics : 422448306679
braingeneers : 443872533066
ucsctreehouse : 238605363322
ucsc-bisti-dev : 851631505710 (Removed)
ucsc-genome-browser : 784962239183
dockstore-dev : 635220370222
ucsc-spatial : 541180793903 (Removed)
platform-hca-prod : 542754589326
platform-hca-portal : 158963592881
miga-lab : 156518225147
platform-anvil-dev : 289950828509
platform-anvil-prod : 465330168186
platform-anvil-portal : 166384485414
platform-temp-dev : 654654270592
agc-runs : 598929688444
sequencing-center-cold-store : 436140841220
hprc-training : 654654365441
e831e055591fb888834bbd0b90d00749b9db1aab
TFL
0
56
522
2024-10-13T15:06:55Z
Weiler
3
Created page with "__TOC__ Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it. 1: Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it."
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
1: Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
4a3ae6f0989f3071d12475ccc8a8ff1d05125bed
524
522
2024-10-13T15:09:59Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
1: Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
c01580f32e7e7280066ee874eb565414b61e77e9
525
524
2024-10-13T15:22:10Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
The next thing is that you should type in '''pip3 install selenium'''.
83baa305df7d3b65f8a363abef0d33b4239a5b16
527
525
2024-10-13T15:23:07Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
The next thing is that you should type in '''pip3 install selenium'''.
1cb4aea6b6c86e47f832f963912ca1019ab5d1bf
528
527
2024-10-13T15:28:39Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing is that you should type in '''pip3 install selenium'''. This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
6019b6f33558611b10329e4356e14e5c1b0aafd3
530
528
2024-10-13T15:29:36Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing is that you should type in '''pip3 install selenium'''. This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
143aa23ac6bcc963e9c66dd67cd945ed1b00bd82
531
530
2024-10-13T15:31:12Z
Weiler
3
/* Install Selenium */
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is '''pip3 install selenium'''. This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
9f66f994629742bcee4c6f72f124df43a71f4fbc
532
531
2024-10-13T15:43:43Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is '''pip3 install selenium'''. This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window open, you can test your bot code by doing:
cd Desktop
./reserve_tfl_code.py
4f4a516f7b48258d9049a728ad46d40700cfb4c7
533
532
2024-10-13T16:15:45Z
Weiler
3
/* Run the Bot! */
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is '''pip3 install selenium'''. This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
./reserve_tfl_code.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
fa888499966ac996a00c0fb62a4df5eceea0388d
534
533
2024-10-13T16:16:53Z
Weiler
3
/* Install Selenium */
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is:
pip3 install selenium
This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
./reserve_tfl_code.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
816bfa80f7f47c5f954da403add671c00e1d6acc
535
534
2024-10-13T16:37:53Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it. Make sure you have Google Chrome installed before doing these steps.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is:
pip3 install selenium
This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
./reserve_tfl_code.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
2c087a920a8dbf51d469ed40f84cd74505c7adc4
536
535
2024-10-24T04:23:03Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it. Make sure you have Google Chrome installed before doing these steps.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_boy.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is:
pip3 install selenium
This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
./reserve_tfl_bot.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
29bf564d207e8740c4ba1e18a51a3203e9aa18a9
537
536
2024-10-24T04:27:55Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it. Make sure you have Google Chrome installed before doing these steps.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_bot.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is:
pip3 install selenium
This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
./reserve_tfl_bot.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
0c1c1cb366e5ae54c987a7b9be2bf2926d3b4088
538
537
2024-10-24T13:50:26Z
Weiler
3
wikitext
text/x-wiki
__TOC__
Here are the instructions to prepare! Let me know if anyone has any issues, we can work through it. Make sure you have Google Chrome installed before doing these steps.
== Download the Bot Script that I Emailed You ==
It should be called '''reserve_tfl_bot.py'''. Save this to your Desktop (which is located as '''/Users/your_user_name/Desktop''').
== Find the Terminal Application ==
Find the "Terminal" application on your Mac. Every Mac has it. You can do a search for it in "spotlight" (the little magnifying glass on the top right of every Mac Desktop screen). It's also in your applications folder (in the "Utilities" sub folder). Once you find it, drag the application icon to your dock to make a shortcut for it.
[[File:Terminal.png|900px]]
== Prepare the '''reserve_tfl_bot.py''' Script ==
Once you have the terminal open, it will look like a white box waiting for you to type in text. This is what is called a "UNIX command line". There are only a couple things you will need to type in here to prepare.
The first thing you should do in the terminal is navigate to the Desktop. This basically changes your current working directory to be your Desktop in the terminal:
cd Desktop
Then run this command:
chmod 755 ./reserve_tfl_bot.py
That second command makes the script '''executable'''. Which means we can now run the program. Both of those commands won't really give you feedback, then will simply work and not give you any errors hopefully.
[[File:Term2.png|900px]]
== Install Selenium ==
The next thing you need to type in is:
pip3 install selenium
This is basically installing a Python module that will enable the script to control a Google Chrome window. It should run without any errors, and should look something like this:
[[File:Selenium.png|900px]]
== Run the Bot! ==
If you have any Terminal windows open from before, let's close them and open a new one. Once you open a new Terminal window, you can test your bot code by doing:
cd Desktop
python3 ./reserve_tfl_bot.py
It will spew a bunch of text to your terminal window, but the main thing is that is should '''also''' open a Google Chrome window on the TFL reservation page, quickly check available dates and times, and if it doesn't find any, it will quickly close the Chrome window, then open a new one and repeat until it gets a hit. If it finds a time, it will then stop and wait for you to name and contact information, credit card number, etc. If you get to that point, I will send you my credit card info immediately so you can enter it.
If you need to get it to stop checking, which you will need to do to stop testing, or if we "win" and get a reservation, just click on the terminal window, and hit "Control C" a bunch of times, and it should stop.
When the time comes, we will all run this step about 30 seconds before 10:00am on November 1st. Then we wait and cross our fingers!
d36dbdccacc652cf537a12c22da74a1817a285a8
File:Terminal.png
6
57
523
2024-10-13T15:07:38Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Term2.png
6
58
526
2024-10-13T15:22:37Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Selenium.png
6
59
529
2024-10-13T15:28:57Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Using Docker under Slurm
0
44
539
382
2024-12-04T14:45:05Z
Anovak
4
Explain how to pass the right GPUs
wikitext
text/x-wiki
__TOC__
Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.
== Testing ==
You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.
You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:
1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker
Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
weiler/mytools latest be6777ad00cf 19 hours ago 396MB
somedude/tools latest 9b1d1f6fbf6f 3 weeks ago 607MB
$ docker image rm be6777ad00cf
== Resource Limits ==
When running docker containers on Slurm, slurm cannot limit the resources that docker uses. Therefore, when you launch a container, you will need to know how much resources (RAM, CPU) it uses beforehand, determined by your testing. Then launch your job with the following --cpus and --memory parameters so docker itslef will limit what it uses:
docker run --rm '''--cpus=16 --memory=1024m''' docker/welcome-to-docker
The --memory argument is in megabytes (hence the 'm' at the end). So the above example will set a memory limit of 1GB.
== Docker and GPUs ==
If you are using GPUs with Docker, you need to make sure that your Docker container requests access to the ''correct'' GPUs: the ones which Slurm assigned to your job. These will be passed in the <code>SLURM_STEP_GPUS</code> (for GPUs for a single step) or <code>SLURM_JOB_GPUS</code> (for GPUs for a whole job) environment variables. They need to be passed to Docker like this:
docker run --gpus="\"device=${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}\"" nvidia/cuda nvidia-smi
'''Note the escaped quotes'''; the Docker command needs to have double-quotes ''inside'' the argument value. The <code>${:-}</code> syntax will use <code>SLURM_STEP_GPUS</code> if it is set and <code>SLURM_JOB_GPUS</code> if it isn't; if you know which will be set for your job, you can use just that one.
If you are using Nextflow, you will need to set <code>docker.runOptions</code> to include this flag.
docker.runOptions="--gpus \\\"device=$SLURM_JOB_GPUS\\\""
If you are using Toil to run CWL or WDL, the correct GPUs will be passed to containers automatically.
== Cleaning Scripts ==
We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This includes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard, emerald, crimson or razzmatazz.
Also, there are cleaning scripts in place that will destroy any running containers that have been running for over 7 days. We assume that such a container was not launched with '''--rm''' and needs to be cleaned up.
6d66722307eb115352bf6eb143c4808b5652edf5
Requirements for dbGaP Access
0
19
541
487
2025-01-19T22:26:16Z
Weiler
3
wikitext
text/x-wiki
If you need NIH dbGaP access, there are several requirements to gaining access - please complete all these requirements '''BEFORE''' requesting dbGaP credentials. NOTE: If you already have GI VPN access to the GI "Prism" Environment, then you have already completed the requirements detailed below - let the GI Cluster Admin Group (cluster-admin@soe.ucsc.edu) know and we can quickly move to getting you set up.
Please use this checklist to make sure that you have completed all '''three''' requirements.
'''1'''. Your PI's info and your PI's approval
'''2'''. NIH Public Security Refresher Course Certificate
'''3'''. Signed NIH Genomic Data Sharing Policy Agreement
'''1''': You are required to ask your PI or sponsor to email '''cluster-admin@soe.ucsc.edu''' requesting dbGaP access for you - this email should include:
Your name
Your PI's name
PI's approval for this access
'''2''': You must take the NIH Public Security Refresher Course online, then print out the Completion Certificate (which should have your name on it) at the end of the training and deliver it to the GI Grants Team. You must complete the course in a single continuous sitting in order to be able to print the certificate at the end:
https://irtsectraining.nih.gov/publicUser.aspx
Click on the "2020 Information Security, Counterintelligence, Privacy Awareness, Records Management Refresher" link to begin the course. At the end you will be able to print out the completion certificate that should have your name on it.
'''3''': Please print, read entire NIH Genomic Data Sharing Policy agreement (link located below for download), sign the last page of the document, scan and email executed document to cluster-admin@soe.ucsc.edu with Subject Line to include: NIH GDS document. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
6d146e05c1221f03c05b7f4ca296f495e1d57c26
Resetting your VPN/PRISM Password
0
60
543
2025-01-19T22:31:52Z
Weiler
3
Created page with "If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request). Once we have sent you you new temporary password, you will need to: * Log into the PRISM VPN using this new temporary password. * Log into one of the server behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password. * Once..."
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you you new temporary password, you will need to:
* Log into the PRISM VPN using this new temporary password.
* Log into one of the server behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
* Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new paasword (and type it twice for confirmation), it will log you out of your SSH session.
* Log out of the VPN.
* Log back into the VPN using your '''new''' password.
* Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
23bad8438aecb5e7a8a63d06242ec400bca148fd
544
543
2025-01-19T22:32:10Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you you new temporary password, you will need to:
** Log into the PRISM VPN using this new temporary password.
** Log into one of the server behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
** Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new paasword (and type it twice for confirmation), it will log you out of your SSH session.
** Log out of the VPN.
** Log back into the VPN using your '''new''' password.
** Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
4540f7006d0c5c50f016f7af8ba4554aff9af35d
545
544
2025-01-19T22:32:33Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you you new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the server behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new paasword (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out of the VPN.
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
c266fa0f32aeef814d261ddc3fa75c445852df44
546
545
2025-01-19T22:32:48Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the server behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new paasword (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out of the VPN.
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
7b8c248c0727567958e373410c2e3351b0f4a41e
547
546
2025-01-19T22:33:41Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send am email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new password (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out of the VPN.
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
65f102a5436f56fa2ae524073be48ddb1dad6b3f
548
547
2025-01-19T22:34:29Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new password (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out of the VPN.
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
9aefa54ad5eecba63a0260ca5970e1b7b8ad461f
549
548
2025-01-19T22:35:12Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new password (and type it twice for confirmation), it will log you out of your SSH session.
4: Log (disconnect) out of the VPN. This step is very important!
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
189c599562faf7c5ba05eeae68bc6fb56f62c80d
550
549
2025-01-19T22:35:51Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new password (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
2cd72a31e7fc81c15322e4d94b68494788676fb0
551
550
2025-01-19T22:36:22Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. Once you choose a new password (and type it twice for confirmation), it will log you out of your SSH session.
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password that you chose in step 3.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
35fb8739034d9b1184ce1f75e19d826ebaf58ab1
552
551
2025-01-27T19:04:18Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. If it does not ask you to change your password, use the '''passwd''' command to change your password. Once you choose a new password (and type it twice for confirmation), log out of your SSH session.
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password that you chose in step 3.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
61ff4a75547fdbce569870290808132e0c2d4dbe
553
552
2025-01-27T19:10:50Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. If it does not ask you to change your password, use the '''passwd''' command to change your password. Once you choose a new password (and type it twice for confirmation), log out of your SSH session.
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password that you chose in step 3.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
b4be7dea82a76a68be39d413654a30019b4a6e60
Resetting your VPN/PRISM Password
0
60
554
553
2025-01-27T19:17:39Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. If it does not ask you to change your password, use the '''passwd''' command to change your password. Once you choose a new password (and type it twice for confirmation), log out of your SSH session. '''NOTE:''' Your new password must be 10 characters long, using three or more character classes (lowercase, uppercase, number or special character).
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password that you chose in step 3.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
e2a935d6d021e8132c326449ec92c0c45b3bb07d
Requirement for users to get GI VPN access
0
9
555
306
2025-02-08T15:19:44Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg - we will send you the username and password for the website via email.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
a8870673d344c8a3a7805cbb9b7e1a45eb99a8b0
556
555
2025-02-08T15:20:49Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above.
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg - we will send you the username and password for the website via email. This VPN client is pre-packaged by us and will install fully configured and ready to go.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
b294b737ee82d986d237a5b0d19ed95a5ba07477
557
556
2025-02-08T15:21:58Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
You will need access to the "eduroam" wireless network '''prior''' to your zoom appointment if you are on campus. Other UCSC wireless networks such as "cruznet" will not work with our VPN software, so please make sure your laptop works with eduroam before coming to your appointment. Instructions on how to get on eduroam are detailed here:
https://its.ucsc.edu/wireless/eduroam.html
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
Before your appointment, please make sure you install the appropriate OpenVPN software on your laptop:
A laptop running OS X, Windows or Ubuntu
For Macs, please download and install '''Tunnelblick''' from https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg - we will send you the username and password for the website via email. This VPN client is pre-packaged by us and will install fully configured and ready to go.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
Please do NOT worry about how to configure the software at this point. We will help you to set it up at your appointment.
We will correspond with you via email on when the appointment will be. The zoom meeting can take up to 30 minutes per person depending on whether or not any issues come up during the software setup. If you show up for your appointment without one (or more) of the above outlined requirements, we will have to reschedule your appointment for a time when you can arrive after completing the above requirements.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
18454863b7258af62dae4a91d90d577339c52b71
577
557
2025-02-09T16:06:40Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit.
We will receive your completed request and we will create your account, then you will receive a welcome email with instructions on how to configure your VPN and gain access to our systems.
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
For Windows, please download and install '''OpenVPN Client''' from https://openvpn.net/index.php/open-source/downloads.html. Select ''openvpn-install-x.x.x-xxxx.exe''
For Ubuntu, please install network-manager-openvpn by typing:
sudo apt-get install network-manager-openvpn network-manager-openvpn-gnome
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts typically expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
522e3d1e6d466eece613657957555c954a9d004e
597
577
2025-02-09T17:17:39Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2022 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit.
We will receive your completed request and we will create your account, then you will receive a welcome email with instructions on how to configure your VPN client and gain access to our systems.
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts typically expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
e5ef1754dcb20f95225e7109f708feeec0e12c3b
How to access the public servers
0
11
558
429
2025-02-08T16:14:36Z
Weiler
3
wikitext
text/x-wiki
== How to Gain Access to the Public Genomics Institute Compute Servers ==
If you need access to the Genomics Institute compute servers please complete this request form:
https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce. There are two parts in this process.
1. For the user, please fill in ALL required fields and submit. The form will get sent to your PI for approval. Remind them to approve it, or it won't get sent to the systems group for processing.
2. For the Sponsor/PI - you will receive an email from Smartsheet. Please fill in all required fields and submit.
We will receive your completed request and we will create your account and go over the details via a short zoom meeting with you.
== Account and Storage Cost ==
Costs for having an active UNIX account and for storage (per TB) are listed in this document, under "Genomics Project Support", specifically "Genomics IT Systems User Support per user" and "Genomics Data Storage per TB":
https://planning.ucsc.edu/budget/rates-and-assessments/recharge-rates/docs/2021-22-approved-recharge-rates.pdf
== Account Expiration ==
Your UNIX account will have an expiration date associated with it after creation, as requested by your sponsor. Please take note of this expiration date when your account is created.
You will receive notice by email when your account is about to expire. To renew, simply ask the PI that sponsored you (which will be included in the notice) to email '''cluster-admin@soe.ucsc.edu''' requesting that your account to be renewed for another year, or any other requested amount of time.
If your account expires, the account will be suspended and you will no longer be able to login or view any data you may have in our systems. Any automated scripts (owned by you) that run via cron or other mechanisms will cease to function.
== Server Types and Management==
You can ssh into our public compute servers via SSH:
'''courtyard.gi.ucsc.edu''': 1TB RAM, 64 cores, 672GB local scratch space, Ubuntu 22.04.2
'''park.gi.ucsc.edu''': 256GB RAM, 32 cores, 5TB local scratch space, Ubuntu 22.04.2
These servers are managed by the Genomics Institute Cluster Admin group. If you need software installed on them, please make your request by emailing cluster-admin@soe.ucsc.edu.
== Storage ==
These servers mount two types of storage; home directories and group storage directories. Your home directory will be located as "/public/home/username" and has a 30GB quota. The group storage directories are created per PI, and each group directory has a default 15TB quota (although in some cases the quota is higher). For example, if David Haussler is the PI that you report to directly, then the directory would exist as /public/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /public/groups/hausslerlab for example, you would do:
$ viewquota hausslerlab
Project quota on /export (/dev/mapper/export)
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
hausslerlab 1.8T 15T 16T 00 [------]
== Actually Doing Work and Computing ==
When doing research, running jobs and the like, please be careful of your resource consumption on the server you are on. Don't run too many threads or cores at once if such a thing overruns the RAM available or the disk IO available. If you are not sure of your potential RAM, CPU or disk impact, start small with one or two processes and work your way up from there. Also, before running your stuff, check what else is already happening on the server by using the 'top' command to see who else and what else is running and what kind of resources are already being consumed. If, after starting a process, you realize that the server slows down considerably or becomes unusable, kill your processes and re-evaluate what you need to make things work. These servers are shared resources - be a good neighbor!
== Serving Files to the Public via the Web ==
If you want to setup a web page on courtyard, or serve files over HTTP from there, do this:
mkdir /public/home/''your_username''/public_html
chmod 755 /public/home/''your_username''/public_html
Put data in the public_html directory. The URL will be:
http://public.gi.ucsc.edu/''~username''/
== /data/scratch Space on the Servers ==
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.
a7deb4be75f3c9016cd3b4d69b12e7f13e3e1c95
Genomics Institute Computing Information
0
6
559
542
2025-02-08T19:31:55Z
Weiler
3
/* VPN Access */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Setting Up The VPN on a Mac]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
873c10870b3096bb1872bccdb3f41c74b924ac60
560
559
2025-02-09T00:18:01Z
Weiler
3
/* VPN Access */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Setting Up The VPN on MacOS]]
*[[Setting Up The VPN on Windows]]
*[[Setting Up The VPN on Linux]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
6f6df3ec8680ba9dcaada261c086c26b6e99deec
603
560
2025-02-10T17:50:23Z
Weiler
3
/* VPN Access */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Setting Up The VPN on MacOS]]
*[[Setting Up The VPN on Windows]]
*[[Setting Up The VPN on Linux]]
*[[Multi Factor Authentication (MFA) Frequently Asked Questions]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
0f97ec3a9491ee90e4d0a24018fa091028ea4ee6
Setting Up The VPN on MacOS
0
61
561
2025-02-09T15:14:03Z
Weiler
3
Created page with "For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers! You will need to download Tunnelblick from this link: https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg The username and password to access that website will be sent to you in your account creation welcome email. Once you..."
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers! You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that website will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on Tunnelblick to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
2554f148f5e21b7ae4354b15c4155bbdd5a7509d
562
561
2025-02-09T15:14:37Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers! You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that website will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
44bdcc16c261cfed25f544358bf71ccc2a60db0f
563
562
2025-02-09T15:21:43Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that website will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
059a9cece9149ddb31d0279fcffcc9766dab5738
564
563
2025-02-09T15:21:52Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that website will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
80c44cae8a33962b73baad5b7aef822dde5968b6
565
564
2025-02-09T15:22:45Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
47cce4b35454d9483a1a45df95519c7e296b717b
566
565
2025-02-09T15:24:28Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
308a5eee3bf08b9df8cafc0b1347e8cca1ea9379
567
566
2025-02-09T15:26:31Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password. Once you change your password, log out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
c1e9823d39a9ba0f623523d3f90d29fa5b51dd4c
568
567
2025-02-09T15:34:43Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard, then log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
6ce4e7f88a6cb1a7ad5f5c190dc3198ea4c08047
569
568
2025-02-09T15:35:49Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching the software from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
45799b4d35bbaf4d2792712107aed8f5e8445043
570
569
2025-02-09T15:36:42Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
7642b81a00a7ba123f10358df8b5c4538e74aa03
571
570
2025-02-09T15:42:03Z
Weiler
3
wikitext
text/x-wiki
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
28520e0b13afd3a5c94f568567b7fedef867c5f3
572
571
2025-02-09T15:45:25Z
Weiler
3
wikitext
text/x-wiki
Before following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
8bbee2eb11da4321674e0f9d93be79f54f74320b
573
572
2025-02-09T15:45:49Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
39951321180b920381b429b04eef098e3d094608
574
573
2025-02-09T15:47:12Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue follow these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
4cb38a9e5d72e79a9a8bb6ee5d4fe97e61a92d73
575
574
2025-02-09T15:49:42Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
0b421cbc50a4eff579bc7cbcaec9f5e3c6a3dfe8
576
575
2025-02-09T16:01:55Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen, next to the date and WiFi icon. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
2e256dad405930979788d7c404221c9e2a52e973
600
576
2025-02-09T21:12:52Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. This software is pre-packaged for our systems and already includes built-in configuration. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick. You will need to download Tunnelblick from this link:
https://giwiki.gi.ucsc.edu/downloads/Tunnelblick.dmg
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you have downloaded the file, navigate in the Finder to wherever you downloaded it to and double click on it. It should open the .dmg file and you should see the Tunnelblick application icon. Double click on the Tunnelblick icon to install. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen, next to the date and WiFi icon. You should be able to click on that icon, then click "Connect prism" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
de506f991127584b93b68c8d3112bc28b8c5cc14
Setting Up The VPN on Linux
0
62
578
2025-02-09T16:13:15Z
Weiler
3
Created page with "'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here: [[Requirement_for_users_to_get_GI_VPN_access]] After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions. Most Linux flavors support OpenVPN client software...."
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
b25de05be62fd95712f5d16212c31af7cb16ab76
579
578
2025-02-09T16:17:29Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome settings''' option and select '''Network''' tab and click on the '''VPN +''' symbol:
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
040062acc9f122db8fd1eda20e6d6dcabf405ebe
580
579
2025-02-09T16:17:57Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
edc73beee8e6086dc04c0518513617474e3a381d
581
580
2025-02-09T16:19:36Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Keypairs.png|900px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
482d80f2fbec2ff43a56528ecc1250eed36092dc
586
581
2025-02-09T16:24:00Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|900px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
3321a5abb6c24ec1503d782a2e5d6679d617ee17
587
586
2025-02-09T16:24:16Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
50c1709fc1876feb05b3ece61447acfc656369d7
588
587
2025-02-09T16:25:18Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
ddbb7a53785522a95fe73ce6f6f3cef2a77df074
589
588
2025-02-09T16:26:14Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' windows, click on the '''Import from file…''' option:
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
bcb6daa26cd63b8f5c9560dc4c732f7df64dbeba
591
589
2025-02-09T16:28:20Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' windows, click on the '''Import from file…''' option:
[[File:Configuring_2.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
188918cc7dace67300a3b36599a3099498a4649d
595
591
2025-02-09T16:35:00Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
7d2923e2115e0316311b2f65c5c04162903b7d08
596
595
2025-02-09T16:37:17Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (toggle the '''On/Off''' button from the Network Manager GUI VPN interface). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
a64f5f98c679750eb7e6156884e8168e7e480afc
602
596
2025-02-09T21:13:27Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (toggle the '''On/Off''' button from the Network Manager GUI VPN interface). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
42bd162dd74df0f89f55ab6730927a5ec1a7327e
File:Configuring 1.png
6
63
582
2025-02-09T16:20:45Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Quick Start Instructions to Get Rolling with OpenStack
0
26
583
331
2025-02-09T16:21:05Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Configuring_1.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
'''Your key must be an RSA key!''' The newer ED25519 keys '''do not work''' with our version of OpenStack.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova" and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
b68aa4ef63fcdce18d6681873003a3723bbe0868
584
583
2025-02-09T16:22:53Z
Weiler
3
/* Upload your SSH Public Key */
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypair.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
'''Your key must be an RSA key!''' The newer ED25519 keys '''do not work''' with our version of OpenStack.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova" and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
a0dba0dbaca77e7c9d4f67451d392d0705d2ac69
585
584
2025-02-09T16:23:18Z
Weiler
3
Reverted edits by [[Special:Contributions/Weiler|Weiler]] ([[User talk:Weiler|talk]]) to last revision by [[User:Anovak|Anovak]]
wikitext
text/x-wiki
__TOC__
==Request an OpenStack Account==
Once you have [http://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access PRISM/GI VPN access], you can request an OpenStack account. You will need to send an email to cluster-admin@soe.ucsc.edu asking for access, and let us know which lab you are in, or who your PI is, so we can place you in the right OpenStack group.
==Create a SSH Public/Private Keypair==
To log into an OpenStack VM instance, you will need a SSH public key. The key is "injected" into the instance upon creation, and only someone with that key (i.e. you) will be able to log in via SSH initially. If you already have a SSH public and private key that you use elsewhere, you can use that one, and can skip to the next step. If you don't have a SSH keypair set up yet, then you will need to log into the UNIX compatible machine you will be '''logging in from''' (a Mac/Apple computer will also work), and then run the 'ssh-keygen' command. If you are behind the VPN, you can first log into mustard, crimson or razzmatazz, which are linux servers. The command will look something like this:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/public/home/frank/.ssh/id_rsa):
Created directory '/public/home/frank/.ssh'.
Enter passphrase (empty for no passphrase): [JUST HIT ENTER]
Enter same passphrase again: [JUST HIT ENTER]
Your identification has been saved in /public/home/frank/.ssh/id_rsa.
Your public key has been saved in /public/home/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dhJG1A3gcwj7Mz17ommt3NIczMVVgrzp8Tf6F1X4jpI
The key's randomart image is:
+---[RSA 2048]----+
| ..+o.+ ..o.|
| = .. + o..|
| . * .. + ..|
| o = * o|
| So+o + o.|
| . =+oE ooo|
| +o.....o|
| .o++o . .|
| .=o. ...|
+----[SHA256]-----+
You will then have a new directory, "~/.ssh", and inside that directory you will have a file called "id_rsa.pub". That is your SSH public key. You will need this in the next step in order to set up your key in OpenStack.
==Log In To giCloud==
Once you have been notified that your account has been set up and have been given login credentials, connect to the VPN and then go to this link in your favorite web browser, which is the login page:
http://gicloud.prism
To login, enter your username and password. Also you will see a "Domain" field, just enter the word "default" for the domain. Click "Log In". You will be logged into your group's summary page.
==Upload your SSH Public Key==
After creating your new key in the above "Create a SSH Public/Private Keypair" step, you will need to upload that key into OpenStack. Once you are logged in, on the left hand navigation menu, click "Project", then in the submenu, select "Compute", and finally select "Key Pairs". It should take you to the "Key Pairs" window as shown here.
[[File:Keypairs.png|900px]]
Next click the "Import Public Key" button on the top right of the window. In the resulting window, name your key in the "Key Pair Name" field. Name it something descriptive like "laptop-key" if the key is on your laptop, or "mustard-key" if you are logged into mustard, etc.
'''Your key must be an RSA key!''' The newer ED25519 keys '''do not work''' with our version of OpenStack.
To get your key, open a terminal window and type "cat ~/.ssh/id_rsa.pub" to get your full key, as so:
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyVKNfdBbDIk7Iq8JmL+u3vxAn4M1iaQgMU5tHJhMSAYBZEZRLZAFc+Qovxe5zzs1ixte9lCipLep39q2I4U8XND17nYliZ4HVM4MW4GsMUfKsgX2FI3mB2vAQ9pZSLkAhTg2D+92uALUSSv1cDZhTqo7DuPRX2Upxyd5QbRL6TRFswBjHz2vY/JpaPQm1S1d10mokPpmxehLfwp0mVgmz1Uv/6FflqiZ68DhDN67cs1yQgWYXQ01IHPjzTKRwCuZVkgT99rkoqy6TkAyrvsfzYPZbIA2y+ovOBzq6WCUT9gp5Jx/UE6CxLSmAuGPAJkV5D/twKIe75xc+5jdi3I1cgKw== user@laptop
Copy that whole line, starting with "ssh-rsa" all the way through the very last character, including the "user@laptop" bit (which may be different for you, just be sure to include it in the line copy).
Then back in the OpenStack Key Pair dialogue window, paste in the keypair in the "Public Key" window, then click "Import Key". The key should then appear in the key list.
==Launch a New Instance==
We are now ready to launch our new VM instance. On the left navigation menu, select "Project", then in the submenu, select "Compute", and finally select "Instances". You will see any currently running instances in your group in the resulting screen.
Next you need to click the "Launch Instance" button on the top right. You will be put into the "Details" tab in the instance creation dialogue. You need to choose an instance name and enter it into the "Instance Name" field. It should include your username as a prefix so that others know who owns each instance. Something like "frank-newtest1" would work well. You can ignore the "Description" field, "Availability Zone" should be "nova" and "Count" should be "1".
Next click the "Source" tab on the left. In the "Source" menu, in the "Select Boot Source" field, select "Image" and next to it select "No" for "Create New Volume". Then in the below list of images, choose your image and click the little "Up Arrow" icon to the right of the image you want to add it.
Next click the "Flavor" tab on the left. In that menu, choose how much CPU, RAM and disk space you want for your new VM. Some images have minimum requirements, and as such some of the smaller flavors may not be available. Select your flavor by clicking the little "Up Arrow" icon on the right of your flavor.
Next click the "Key Pair" tab on the left. Click the little "Up Arrow" to the right of the Kep Pair you created in the previous step where you create a Key Pair.
Ignore the rest of the options on the left, you have configured all you need to launch the instance. Click the blue "Launch Instance" button on the bottom right of your window, as seen below:
[[File:Launch.png|850px]]
You will be taken back to the Instances Summary page and you should see your new instance launching. After a bit your instance will change from the "Spawning" to "Running". This means the instance is now booting, and should finish booting in a minute or two. In the meantime we will need to attach a "Floating IP" address to your instance such that you can SSH into the instance. On the right side of your running instance, you should see a drop-down menu, usually the "Create Snapshot" option is pre-selected. Click the drop down menu arrow to open that menu, and select "Associate Floating IP".
In the "Associate Floating IP" dialogue, click the drop down menu to see if any IP addresses are already available, and if so, go ahead and select one. If there are none available, click the little "+" button to the right to allocate a floating IP address. It will ask you what Pool to use, select "ext-net". You can put in a description if you want but most folks leave that field blank. Then click "Allocate IP". It will take you back one menu level. It will have a field "Port to be Associated", just leave that alone with the default that is already there. Click the blue "Associate" button on the bottom right of the window.
You will be returned to the "Instances Summary" page again. You will see your instance running, and it should now list a "Floating IP" that it is running under. That is the IP that you will use to SSH to the instance.
==Connect to Your New Instance==
Now that your instance is up and running, let's SSH to it and get going! '''From the computer you created your SSH keys on,''' SSH to your instance using the username as the OS type you chose (ubuntu, centos, etc), and the Floating IP address your instance has. '''You must be connected to the VPN for this to work!''' Example:
$ ssh ubuntu@10.50.100.67
If you launched a CentOS instance, it would instead be "ssh centos@10.50.100.67", as appropriate. Assuming everything went as planned, you will be logged into your new Linux instance as the "ubuntu" or "centos" user, which is an unprivileged user. If you get a "Connection Refused" error when trying to SSH in, it means your instance isn't quite through launching yet, try again in about 30 seconds. You have full sudo rights however to do whatever administration you need to do. At this point it is assumed you have a little systems administration skills in your belt, or at least have some time to query Google as to how to perform various Linux tasks as necessary. Your instance has full Internet access to the Greater Internet, so you can download thing fro the Internet, run "apt-get install" or "yum update" or whatever is appropriate. You can also then install any needed software you need to get your work done.
'''NOTE:''' Your are the Systems Administrator of your instance - we cannot support questions on how to administer Linux for you. If OpenStack itself is having issues then please let us know, but please defer questions like "How do I install software on Ubuntu" to Google searches.
==Storage on Your New Instance==
Most of your storage on your new instance will be located in the /mnt directory, as seen by a "df -h" command on your instance:
ubuntu@erich1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 676K 3.2G 1% /run
/dev/vda1 20G 975M 19G 5% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb1 1.0T 1.1G 1023G 1% /mnt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
Notice that "/mnt" has 1TB of disk space, so store all your big important data in /mnt. Avoid storing data on "/" whenever possible to prevent issues with the root filesystem filling up. The exact amount of storage available will depend on what flavor you chose when creating the instance.
==Instance Control Options==
Just a few notes on controlling your instances. They are fully functioning Linux machines, so a "sudo reboot" will reboot the machine, "sudo poweroff" will shut it down, etc. In cloud parlance, "Shut Down" means the instance is still there but the power is off. "Terminated" means it's fully deleted and is unrecoverable, so be sure you want to delete your instance before you do so. We do not back instances up. We also have no access to your instance so we cannot log in and see what's going on.
You can control your instance in several ways from the OpenStack web interface, in the Instance Summary page. On the right side of your instance in the list will be that little drop down menu. Options of interest are:
'''1: Create Snapshot'''
Never use this option as we have not implemented snapshotting in this environment.
'''2: View Log'''
This will show you the boot/console log of the instance, so you can see if anything is causing issues.
'''3: Hard Reboot Instance'''
This will hard reboot your instance, kind of like hitting the power button to power the instance off, then it will power back on moments later. Useful if your instance is hosed because of a software crash or other things that may have crashed the instance.
'''4: Delete Instance'''
This will permanently destroy your instance. It will be deleted and is unrecoverable. It will also free up the resources it was using such that others can use them however. This is useful if the group quotas have been reached and some old instances need to be cleaned out to make room for new ones.
'''5: Start Instance'''
This option will be available if the instance is in the "Shut Down" state. It will boot up the instance when this option is invoked.
Do not use the other options you may see there, most have not been implemented in our deployment of OpenStack.
==Changing Your OpenStack Web Interface Password==
Once you have logged in to the Web Interface, you can change your password by doing the following.
On the top right of the OpenStack web interface, you should see a little icon with your username on it. Click that icon to expand the drop down menu there, and select "Settings". Then in the next window, on the left navigation bar, you should see the "Change Password" button. Complete the Change Password dialogue to change your password. You may have to log in again after changing your password.
==Networking==
Your instances are connected at 10Gb/s between each other and the internet. Of course, actual transfer speeds will likely vary based on disk speed, speed of the location to are transferring data to or from, and other factors.
Your instance will be located in a private network that can only be seen by other instances in your group. Other OpenStack groups are logically separated into their own networks and your instance cannot route to them. Also, no one can access your instance unless they have a VPN account with us, so your instances are completely fenced off from the Greater Internet inbound, which means you are largely secure against script kiddies and hackers. You are able to connect outbound from your instances.
==Etiquette==
There is one main thing to remember when using instances in OpenStack. When you create an instance, it uses CPU, RAM and most importantly, it pins disk space for that instance. If you use up all the disk, CPU and RAM quota for your group, then others have no resources left to create their own instances. It is important to know that the best plan of action is to fire up your VM and keep it up when you need it, and then copy your data off it and delete the instance. Document steps taken to create your instance such that you could do it again if you needed to. If the physical node that your instance resides on blows up, then your instance is lost forever and we have no backups, so it is up to you to back up important data. It's also not good form to spin up an instance and store data there, but not log in for months at a time. Then, you are pinning resources that other may need for urgent work. Try to be a good neighbor!
2479bff28d9ffcd56d1c9d26d22a0966edf5f15e
File:Configuring 2.png
6
64
590
2025-02-09T16:27:45Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Configuring 3.png
6
65
592
2025-02-09T16:31:41Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Configuring 4.png
6
66
593
2025-02-09T16:31:53Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
File:Configuring 5.png
6
67
594
2025-02-09T16:32:07Z
Weiler
3
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
Setting Up The VPN on Windows
0
68
598
2025-02-09T17:28:43Z
Weiler
3
Created page with "'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here: [[Requirement_for_users_to_get_GI_VPN_access]] After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions. We will be installing OpenVPN Connect client for Wind..."
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/community-downloads/
Download the '''Windows 64-bit MSI installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and apssword, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (select "Disconnect" from the '''OpenVPN Connect''' application). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
64f1983bfa1c177892aa496dfdc9db1434b71e3d
599
598
2025-02-09T17:29:37Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/community-downloads/
Download the '''Windows 64-bit MSI installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (select "Disconnect" from the '''OpenVPN Connect''' application). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password).
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
fbb8f548ed339bc93d35c7c6702c0da334917e4a
601
599
2025-02-09T21:13:12Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/community-downloads/
Download the '''Windows 64-bit MSI installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (select "Disconnect" from the '''OpenVPN Connect''' application). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
015e996cca204c3a6f93948f9cae3f92627b657b
Multi Factor Authentication (MFA) Frequently Asked Questions
0
69
604
2025-02-10T18:15:54Z
Weiler
3
Created page with "__TOC__ == Why Do We Need MFA To Login To The VPN? == We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying. == What Kind of MFA System Are We Using here at the GI? == We are using '''Duo Mobile''' as..."
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''',
f6d26f75e9e1ec425c97ac122fb80e9b864e5b75
608
604
2025-02-10T21:38:12Z
Weiler
3
/* Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? */
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which authenticate MFA, enroll a new device (like a phone), etc.
5d011b985687d3ab8c6c2827a70e428a18473aaa
609
608
2025-02-10T22:25:41Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which authenticate MFA, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
67d006283380ed9669c81a95a43e63c4aacb2d86
610
609
2025-02-10T22:26:16Z
Weiler
3
/* Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? */
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
4ddae516c757eb6281465afa82b31c860a31108f
628
610
2025-02-11T22:29:47Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
== All This Documentation References "Duo Push", but I use Duo by Another Method...? ==
Most of our users use Duo in the context of getting a "Push", i.e. when you enter your username and password, you get a Push Request on your phone and click the green "Accept" button to finish authentication. But there are a few cases where that is not possible. If the Push option of Duo authentication is not possible, you can utilize the "Rolling Code" option or the "Yubikey" option. For the "Rolling Code" option, you must have already enrolled Duo on your phone. Then open the Duo App on your phone and click the "UC Santa Cruz" option. It should show you a six digit passcode. Then, when you authenticate to the Genomics Institute VPN, type your username in the "Username" field on your VPN client, then in the password field, type your password, then a comma, then the six digit code you see in the Duo App.
For example, if my credentials were:
username: bob
password: C@ndyIsFun
and I looked on my phone and saw my six digit code in the Duo App as "643726", I would enter these credentials:
username: bob
password: C@ndyIsFun,643726
And that would authenticate me. Same with a Yubikey. In the "Password" field, just type in your password followed by a comma, followed by the code on your Yubikey.
f55820aa1e53e2090bfcd3ab8f75192939ffbf91
629
628
2025-02-11T22:33:26Z
Weiler
3
/* All This Documentation References "Duo Push", but I use Duo by Another Method...? */
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
== All This Documentation References "Duo Push", but I use Duo by Another Method...? ==
Most of our users use Duo in the context of getting a "Push", i.e. when you enter your username and password, you get a Push Request on your phone and click the green "Accept" button to finish authentication. But there are a few cases where that is not possible. If the Push option of Duo authentication is not possible, you can utilize the "Rolling Code" option or the "Yubikey" option. For the "Rolling Code" option, you must have already enrolled Duo on your phone. Then open the Duo App on your phone and click the "UC Santa Cruz" option. It should show you a six digit passcode. Then, when you authenticate to the Genomics Institute VPN, type your username in the "Username" field on your VPN client, then in the password field, type your password, then a comma, then the six digit code you see in the Duo App.
*** Rolling Code Option:
For example, if my credentials were:
username: bob
password: C@ndyIsFun
and I looked on my phone and saw my six digit code in the Duo App as "643726", I would enter these credentials:
username: bob
password: C@ndyIsFun,643726
And that would authenticate me.
*** Yubikey Option
It's the same idea with a Yubikey. In the "Password" field, just type in your password followed by a comma, followed by the code on your Yubikey.
26a14b1a0b0d033c8f3890aa041b29774acbc3fc
630
629
2025-02-11T22:33:44Z
Weiler
3
/* All This Documentation References "Duo Push", but I use Duo by Another Method...? */
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
== All This Documentation References "Duo Push", but I use Duo by Another Method...? ==
Most of our users use Duo in the context of getting a "Push", i.e. when you enter your username and password, you get a Push Request on your phone and click the green "Accept" button to finish authentication. But there are a few cases where that is not possible. If the Push option of Duo authentication is not possible, you can utilize the "Rolling Code" option or the "Yubikey" option. For the "Rolling Code" option, you must have already enrolled Duo on your phone. Then open the Duo App on your phone and click the "UC Santa Cruz" option. It should show you a six digit passcode. Then, when you authenticate to the Genomics Institute VPN, type your username in the "Username" field on your VPN client, then in the password field, type your password, then a comma, then the six digit code you see in the Duo App.
=Rolling Code Option=
For example, if my credentials were:
username: bob
password: C@ndyIsFun
and I looked on my phone and saw my six digit code in the Duo App as "643726", I would enter these credentials:
username: bob
password: C@ndyIsFun,643726
And that would authenticate me.
*** Yubikey Option
It's the same idea with a Yubikey. In the "Password" field, just type in your password followed by a comma, followed by the code on your Yubikey.
42bf0430afd882d5cb49c75fc3a42533c225684d
631
630
2025-02-11T22:34:16Z
Weiler
3
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
== All This Documentation References "Duo Push", but I use Duo by Another Method...? ==
Most of our users use Duo in the context of getting a "Push", i.e. when you enter your username and password, you get a Push Request on your phone and click the green "Accept" button to finish authentication. But there are a few cases where that is not possible. If the Push option of Duo authentication is not possible, you can utilize the "Rolling Code" option or the "Yubikey" option. For the "Rolling Code" option, you must have already enrolled Duo on your phone. Then open the Duo App on your phone and click the "UC Santa Cruz" option. It should show you a six digit passcode. Then, when you authenticate to the Genomics Institute VPN, type your username in the "Username" field on your VPN client, then in the password field, type your password, then a comma, then the six digit code you see in the Duo App.
'''Rolling Code Option'''
For example, if my credentials were:
username: bob
password: C@ndyIsFun
and I looked on my phone and saw my six digit code in the Duo App as "643726", I would enter these credentials:
username: bob
password: C@ndyIsFun,643726
And that would authenticate me.
'''Yubikey Option'''
It's the same idea with a Yubikey. In the "Password" field, just type in your password followed by a comma, followed by the code on your Yubikey.
1ef90471e76807eb378f45448dd099390a5a59f0
632
631
2025-02-11T22:35:59Z
Weiler
3
/* All This Documentation References "Duo Push", but I use Duo by Another Method...? */
wikitext
text/x-wiki
__TOC__
== Why Do We Need MFA To Login To The VPN? ==
We need to comply with '''NIST 800-171''' Security Standards in order to store data downloaded from NIH, according to new regulations. '''NIST 800-171''' controls require we enable MFA for VPN logins to harden our security posture. MFA is a good idea, security-wise, anyway though! Even though it can be somewhat annoying.
== What Kind of MFA System Are We Using here at the GI? ==
We are using '''Duo Mobile''' as our MFA authentication mechanism. You probably are already using it to authenticate to CruzID related systems.
== Do I need a CruzID before I can use Duo Mobile at the Genomics Institute? ==
Yes, you do. If you do not yet have a CruzID, please ask your sponsor or PI to get you a CruzID set up. You will need this active before you can authenticate to the Genomics Institute VPN.
== Duo is Working To Login to the GI VPN, But It Is Calling My Phone Instead of Sending Me a Push! What Can I Do? ==
If you previously set up Duo to send you a text with a code, or to call you to authenticate, '''and you would prefer to just receive a Push Notification instead''', you can do it by logging in here:
https://cruzid.ucsc.edu/idmuser_login
Use your CruzID Gold username and password. You may get a call or text with MFA stuff in it as usual, but don't act on that yet. During the Duo notice that pops up in your web browser that says "Verify your identity..." there will be a small link below that which says '''Other Options'''. Click that, and from there you should be able to change the way in which Duo MFA authenticates you, enroll a new device (like a phone) by selecting '''Manage Devices''', etc.
== All This Documentation References "Duo Push", but I use Duo by Another Method...? ==
Most of our users use Duo in the context of getting a "Push", i.e. when you enter your username and password, you get a Push Request on your phone and click the green "Accept" button to finish authentication. But there are a few cases where that is not possible. If the Push option of Duo authentication is not possible, you can utilize the "Rolling Code" option or the "Yubikey" option. For the "Rolling Code" option, you must have already enrolled Duo on your phone. Then open the Duo App on your phone and click the "UC Santa Cruz" option. It should show you a six digit passcode. Then, when you authenticate to the Genomics Institute VPN, type your username in the "Username" field on your VPN client, then in the password field, type your password, then a comma, then the six digit code you see in the Duo App.
'''Rolling Code Option'''
For example, if my credentials were:
username: bob
password: C@ndyIsFun
and I looked on my phone and saw my six digit code in the Duo App as "643726", I would enter these credentials:
username: bob
password: C@ndyIsFun,643726
And that would authenticate me.
'''Yubikey Option'''
It's the same idea with a Yubikey. In the "Password" field, just type in your password followed by a comma, followed by the code on your Yubikey Application.
53d738d6aedcb63127ffe8d03cf80961696c8d36
Setting Up The VPN on MacOS
0
61
605
600
2025-02-10T21:20:42Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick.
Download the OpenVPN configuration file we will be using. The username and password to access this web link should have been sent to you in your account creation welcome email:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
And save that file to your Desktop.
Next, you will need to download Tunnelblick (the latest Stable Version) from this link:
https://tunnelblick.net/downloads.html
Once you have downloaded Tunnelblick, double-click on it and proceed through the installation steps. During installation, it will ask you if you want to install for "Only You" or "All Users". Select "Only You". At the end it will ask if you have any configuration files, say "Yes" and select the '''prism-duo.ovpn''' file you downloaded earlier.
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen, next to the date and WiFi icon. You should be able to click on that icon, then click "Connect prism-duo" to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (click the Tunnelblick icon on the top right of your screen and select "disconnect"). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
d0183b8a43607bec5b67bd958b147ff5247b22e2
623
605
2025-02-11T00:17:28Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick.
Download the OpenVPN configuration file we will be using. The username and password to access this web link should have been sent to you in your account creation welcome email:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
And save that file to your Desktop.
Next, you will need to download Tunnelblick (the latest Stable Version) from this link:
https://tunnelblick.net/downloads.html
Once you have downloaded Tunnelblick, double-click on it and proceed through the installation steps. At the end it will ask if you have any configuration files, say "Yes" and click "OK".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen, next to the date and WiFi icon. Drag the configuration file ('''prism-duo.ovpn''') on your Desktop to the little Tunnelblick icon on the top right of your screen to install the new profile. It will ask you to type in your laptop password (do that). It will also ask you if you want to install for "Only You" or "All Users". Select "Only You".
You should be able to click on that icon again in the top right, then click "Connect prism-duo" from the resulting menu to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (click the Tunnelblick icon on the top right of your screen and select "disconnect"). This step is very important. Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
c4be5384b04c0227701f2651303cac3ec323bc5c
640
623
2025-02-13T19:06:00Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
For MacOS, you will be installing "Tunnelblick", an OpenVPN client software package for Mac. Do not install this software on public or shared computers!
Before installing Tunnelblick, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install Tunnelblick.
Download the OpenVPN configuration file we will be using. The username and password to access this web link should have been sent to you in your account creation welcome email:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
Download that file to your Desktop by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.
Next, you will need to download Tunnelblick (the latest Stable Version) from this link:
https://tunnelblick.net/downloads.html
Once you have downloaded Tunnelblick, double-click on it and proceed through the installation steps. At the end it will ask if you have any configuration files, say "Yes" and click "OK".
After installation, in your Finder, you may want to navigate to the Applications folder and drag the Tunnelblick icon to your dock for easy launching.
After launching Tunnelblick from the Applications folder, you will see a small "tunnel" icon on the top right of your screen, next to the date and WiFi icon. Drag the configuration file ('''prism-duo.ovpn''') on your Desktop to the little Tunnelblick icon on the top right of your screen to install the new profile. It will ask you to type in your laptop password (do that). It will also ask you if you want to install for "Only You" or "All Users". Select "Only You".
You should be able to click on that icon again in the top right, then click "Connect prism-duo" from the resulting menu to start the VPN. "Prism" is the name of our firewalled environment. Use the username and temporary password that we sent to you in your account creation welcome email to login to the VPN for the first time. After typing in your username and password, you will be sent a Duo MFA push to your phone. Accept that push, and then you will be connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
If you are not familiar with SSH, then you will need to open the "Terminal" application which can be found in your Applications Folder under "Utilities". After launching "Terminal" you will connect to mustard by typing:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (click the Tunnelblick icon on the top right of your screen and select "disconnect"). This step is very important. Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
4ab124504b7dcfb75e4fe8a409ce2a77942e8205
Setting Up The VPN on Windows
0
68
606
601
2025-02-10T21:21:25Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/community-downloads/
Download the '''Windows 64-bit MSI installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (select "Disconnect" from the '''OpenVPN Connect''' application). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
1a45eebcb147564c92faf9177f9c9b2f9e8fc45f
624
606
2025-02-11T00:19:01Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/community-downloads/
Download the '''Windows 64-bit MSI installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (select "Disconnect" from the '''OpenVPN Connect''' application). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
5c7f8bc783c46180eca5f789b6cdecbba3fd080e
633
624
2025-02-12T22:14:28Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism.ovpn" file locally to your desktop or somewhere else convenient.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/client/
Download the '''Windows Installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (select "Disconnect" from the '''OpenVPN Connect''' application). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
9294f5067167d5bc78f4565417fa2d8a24f852bb
641
633
2025-02-13T19:06:35Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/client/
Download the '''Windows Installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (select "Disconnect" from the '''OpenVPN Connect''' application). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
580520e78f279d2f442ef260a9f499bdd09a05ce
649
641
2025-02-21T00:38:26Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/
The username and password to access that web link will be sent to you in your account creation welcome email. Download that '''prism.ovpn''' file by right-clicking on the link on website above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/client/
Download the '''Windows Installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (select "Disconnect" from the '''OpenVPN Connect''' application). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
0f0eae1b2929cfcde206da05fbcc1dd5d40f5f49
651
649
2025-02-21T00:41:22Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
We will be installing OpenVPN Connect client for Windows. This VPN client currently supports '''Windows 10''' and '''Windows 11'''. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Download the '''prism-duo.ovpn''' file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.
After downloading the OpenVPN configuration file, you will need to download the OpenVPN Connect client from here:
https://openvpn.net/client/
Download the '''Windows Installer'''. Double click on the Installer to begin installation, and follow the on-screen prompts to complete installation.
Once the installation is complete, launch the '''OpenVPN Connect''' application. Review and agree to the '''Data Usage Policy'''.
After opening the app, we will need to import the '''prism-duo.ovpn''' file you downloaded earlier. Click '''File''' and browse to the location of your '''prism-duo.ovpn''' file. Import the file.
You should now be able to select the new profile and click connect. It should ask you for a username and password, which we will have sent you in our welcome email. After entering the username and password, you will receive a Duo Push on your phone in order to complete authentication. The OpenVPN status icon will appear in your system tray on the bottom right of your desktop. If it is yellow or red, that indicates you are not connected. If it is green, that indicates you are connected.
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
You can access SSH via several common Windows SSH client such as PuTTY. You can also use SSH right from a Command Prompt or Power Shell window (do a Windows search for "command" or "power shell" to find them). Once you have launched one of those applications, do:
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (select "Disconnect" from the '''OpenVPN Connect''' application). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
09fa27c2a4a82bf7db76c6b56348aef0c201e65d
Setting Up The VPN on Linux
0
62
607
602
2025-02-10T21:22:11Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism-duo.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. Then, log out of the VPN (toggle the '''On/Off''' button from the Network Manager GUI VPN interface). Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
80863c381c21dc002319a473a1d106b846cf1053
625
607
2025-02-11T00:19:25Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Save the "prism-duo.ovpn" file locally to your desktop or somewhere else convenient.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (toggle the '''On/Off''' button from the Network Manager GUI VPN interface). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
87e9f64c211703378a639260335fa48dbc9a398a
642
625
2025-02-13T19:06:59Z
Weiler
3
wikitext
text/x-wiki
'''Before''' following these instructions, please ensure that you have filled out an account request form and completed all the training and requirements as detailed here:
[[Requirement_for_users_to_get_GI_VPN_access]]
After completing those requirements, you should have received a welcome email from us explaining that your account is ready. Once you have received that email, continue following these instructions.
Most Linux flavors support OpenVPN client software. While the installation process may vary from flavor to flavor, we will be describing the process to get you going for Ubuntu, which should work on most Ubuntu versions and other Ubuntu/Debian derivatives. Do not install this software on public or shared computers!
Before installing our VPN profile, you must have enrolled your cell phone for Duo MFA using your CruzID account with UCSC. Most folks already have this from when they first started at UCSC. If you don't yet have a CruzID, please contact your sponsor/PI and ask them to help you acquire a CruzID. If you have a CruzID but haven't yet enrolled your cell phone, please follow the instructions here to enroll your phone:
https://its.ucsc.edu/mfa/enroll.html
After confirming your cell phone MFA enrollment, or if you have already done this a while ago, continue to install our VPN profile. You will need to download our OpenVPN client configuration file from this link:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The username and password to access that web link will be sent to you in your account creation welcome email. Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the VPN:
[[File:Configuring_5.png|600px]]
Once you authenticate to the VPN (username/password/MFA), then login via SSH to 'mustard.prism' for example, and you will be asked to change your password.
ssh username@mustard.prism
Where "username" is the username we sent you in the welcome email (incidentally it is also your CruzID username). It will ask you for a password, just type in the password we sent you in your account creation welcome email. When you type the password, the characters '''will not''' echo to the screen, so it will not show you what you are typing. Once logging in successfully to mustard, it will as you to change your password. It will ask for you current password one more time, then it will ask you to choose a new password, which you will need to enter two times. Again, whatever password you choose '''will not''' echo to the screen. You new password must be:
1: At least 10 characters long
2: At least 3 character classes (lowercase, uppercase, number and/or special character)
Once you change your password, it will log you out of mustard. '''Then, log out of the VPN''' (toggle the '''On/Off''' button from the Network Manager GUI VPN interface). This step is very important! Then, log back into the VPN using your '''new''' password. It will send another Duo MFA push to your phone, then you should be logged in!
Then feel free to ssh to any of our firewalled servers (using your new password). Note the following page for available resources:
https://giwiki.gi.ucsc.edu/index.php?title=Firewalled_Computing_Resources_Overview
As always, if you have any questions, please email '''cluster-admin@soe.ucsc.edu''' for help.
709528375ace175e349d47a6fd9fe74fec810cb0
Genomics Institute Computing Information
0
6
611
603
2025-02-10T22:33:37Z
Weiler
3
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Setting Up The VPN on MacOS]]
*[[Setting Up The VPN on Windows]]
*[[Setting Up The VPN on Linux]]
*[[Multi Factor Authentication (MFA) Frequently Asked Questions]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on MacOS]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on Windows]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on Linux]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
d6ed41e0729a8499a6c36f4d8ddc7dbcb60cb62c
653
611
2025-03-15T15:25:56Z
Weiler
3
/* VPN Access */
wikitext
text/x-wiki
Welcome to the Genomic Institute Computing Information Repository! Browse the below topics for help in the area you are curious about.
== GI Public Computing Environment ==
*[[How to access the public servers]]
== GI Firewalled Computing Environment (PRISM) ==
*[[Access to the Firewalled Compute Servers]]
*[[Firewalled Computing Resources Overview]]
*[[Firewalled Environment Storage Overview]]
*[[Firewalled User Account and Storage Cost]]
*[[Grafana Performance Metrics]]
*[[Visual Studio Code (vscode) Configuration Tweaks]]
*[http://logserv.gi.ucsc.edu/cgi-bin/private-groups.cgi '''/private/groups''' Data Usage Graphs]
*[[Resetting your VPN/PRISM Password]]
==VPN Access==
*[[Requirement for users to get GI VPN access]]
*[[Setting Up The VPN on MacOS]]
*[[Setting Up The VPN on Windows]]
*[[Setting Up The VPN on Linux]]
*[[Multi Factor Authentication (MFA) Frequently Asked Questions]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on MacOS]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on Windows]]
*[[Converting From Non-MFA VPN to the MFA-Enabled VPN on Linux]]
*[[Duo Pushes Aren't Being Sent to My Phone!]]
== NIH dbGaP Access Requirements ==
*[[Requirements for dbGaP Access]]
== giCloud Openstack ==
*[[Overview of giCloud in the Genomics Institute]]
*[[Quick Start Instructions to Get Rolling with OpenStack]]
== Amazon Web Services Information ==
*[[Overview of Getting and Using an AWS IAM Account]]
*[[AWS Account List and Numbers]]
*[[AWS Shared Bucket Usage Graphs]]
*[[AWS Best Practices]]
*[[AWS S3 Lifecycle Management]]
== Slurm at the Genomics Institute ==
*[[Overview of using Slurm]]
*[[Cluster Etiquette]]
*[[Annotated Slurm Script]]
*[[Job Arrays]]
*[[GPU Resources]]
*[[Quick Reference Guide]]
*[[Convenient Slurm Commands]]
*[[Slurm Queues (Partitions) and Resource Management]]
*[[Slurm Tips for vg]]
*[[Slurm Tips for Toil]]
*[[Using Docker under Slurm]]
*[[Phoenix WDL Tutorial]]
==General Docker Information==
*[[Running a Container as a non-root User]]
== Problems or technical support ==
If you have any problems with the GI computing environment, please send an email to '''cluster-admin@soe.ucsc.edu'''
0abff570973ddd882410ea17c561468c10863c9d
Converting From Non-MFA VPN to the MFA-Enabled VPN on MacOS
0
70
612
2025-02-10T23:09:42Z
Weiler
3
Created page with "If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing: https://its.ucsc.edu/mfa/enroll.html Disconnect from the VPN if you are already connected. Then you will need..."
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration (on your Desktop), called '''prism-duo.ovpn''' into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
8921aec67d23447c679f1e260edd4322df12781b
613
612
2025-02-10T23:10:37Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration (from your Desktop), called '''prism-duo.ovpn''' into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
971fce488a915ec6acb3d6a443e54bd0f4bfd05e
614
613
2025-02-10T23:12:25Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
3c7d17f5f3491036be71afead628a5ec4991a53a
615
614
2025-02-10T23:15:59Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick but clicking on the old profile in the "Configurations" window, then click on the '''-''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
a88a7605f12740759e9db0240f3e412bbd814047
616
615
2025-02-10T23:16:23Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick but clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
08929e7aafadddbb88f9ced092deb913715fac92
617
616
2025-02-10T23:16:41Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
6fdb4f929853d90a0e7ca8f7ee21f8d1f7781369
618
617
2025-02-10T23:23:26Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
71d314d25d4ce49a3e8b0275a6a934750a532645
635
618
2025-02-13T19:04:02Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop by right-clicking on the link above and selecting "Save Link As...". Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
c9c3dde32cde8901cb05d2b694736d98d89630cc
636
635
2025-02-13T19:04:32Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
a3aa00fb3a64a76d0f286ef0a66989e6a270805b
639
636
2025-02-13T19:05:38Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
64c018dc73afbb98a06e4c040da500facda56d0b
650
639
2025-02-21T00:39:36Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Go to the link above and right-click on '''prism.ovpn''' and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
3ec348a159bf12f70ff8521a9b99959400a3729f
652
650
2025-02-21T00:55:15Z
Weiler
3
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Go to the link above and right-click on '''prism-duo.ovpn''' and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
3d634230d5b2c695dd10691048d9ad110a73e14a
Converting From Non-MFA VPN to the MFA-Enabled VPN on Linux
0
71
619
2025-02-10T23:40:54Z
Weiler
3
Created page with "If you are using OpenVPN on Linux to connect to the GI VPN and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing: https://its.ucsc.edu/mfa/enroll.html OK! Let's get to it. Disconnect from the VPN if yo..."
wikitext
text/x-wiki
If you are using OpenVPN on Linux to connect to the GI VPN and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected. All the various flavors and versions of Linux vary in the specifics, so you may not be following these exact instructions to get it to work. This is based on the Network Manager in Ubuntu, but most Ubuntu/Debian variants will be similar.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop or some other easy to remember location.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the new VPN:
[[File:Configuring_5.png|600px]]
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from the Network Manager.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
145ad699a30df69628bdd1a8e82f9df394dedb25
638
619
2025-02-13T19:05:22Z
Weiler
3
wikitext
text/x-wiki
If you are using OpenVPN on Linux to connect to the GI VPN and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected. All the various flavors and versions of Linux vary in the specifics, so you may not be following these exact instructions to get it to work. This is based on the Network Manager in Ubuntu, but most Ubuntu/Debian variants will be similar.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. or some other easy to remember location.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the new VPN:
[[File:Configuring_5.png|600px]]
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from the Network Manager.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
e6b5466d91769705601d6c0ed2ad8ba6fccbd128
Converting From Non-MFA VPN to the MFA-Enabled VPN on Windows
0
72
620
2025-02-10T23:54:55Z
Weiler
3
Created page with "If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing: https://its.ucsc.edu/mfa/enroll.html OK! Let's get to it. Disconnec..."
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Launch the '''OpenVPN Connect''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to click on OpenVPN Connect again in the system tray and click "Connect". It should show multiple profiles, one for you old profile and one for your new profile. Select the new one.
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
d8f39724494e534f87d387d5d7d7cfd421e4c348
621
620
2025-02-11T00:04:16Z
Weiler
3
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Launch the '''OpenVPN Connect''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for you old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
e8c62ede9aff7143008747152172677276743dd3
622
621
2025-02-11T00:04:53Z
Weiler
3
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Launch the '''OpenVPN Connect''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for your old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
ecabb1a8d9c504cfd11057c1b009274ed03c6d21
634
622
2025-02-12T22:17:17Z
Weiler
3
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file to your Desktop. Launch the '''OpenVPN GUI''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for your old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
b9f686249fa31e3340331eb0076df472a2bfdba1
637
634
2025-02-13T19:05:00Z
Weiler
3
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.. Launch the '''OpenVPN GUI''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for your old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
af0034d9601d2f54f67eb1c421666c07476429d1
Slurm Tips for Toil
0
38
626
470
2025-02-11T20:42:22Z
Anovak
4
Add quotes to protect brackets
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/wdl/running.rst the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pip3 install --upgrade 'toil[wdl]'
To use a development version of Toil, you can install from source instead:
pip3 install 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]'
Or for a particular branch:
pip3 install 'git+https://github.com/DataBiosphere/toil.git@issues/123-abc#egg=toil[wdl]'
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
aec63aec38c0e471a39f3519d9eebfaf81778cf1
627
626
2025-02-11T20:49:14Z
Anovak
4
Change to new extras syntax from https://github.com/pypa/pip/pull/11617
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/wdl/running.rst the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pip3 install --upgrade 'toil[wdl]'
To use a development version of Toil, you can install from source instead:
pip3 install 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git'
Or for a particular branch:
pip3 install 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git@issues/123-abc'
* You will then need to make sure your '''~/.local/bin''' directory is on your PATH. Open up your '''~/.bashrc''' file and add:
export PATH=$PATH:$HOME/.local/bin
Then make sure to log out and back in again.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
351661bba433769fe22065f3ff4bc2a185ba0723
646
627
2025-02-14T19:12:39Z
Anovak
4
wikitext
text/x-wiki
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult [https://github.com/DataBiosphere/toil/blob/master/docs/wdl/running.rst the Toil documentation on WDL workflows].
* Install Toil with WDL support with:
pipx install 'toil[wdl]'
To use a development version of Toil, you can install from source instead:
pipx 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git'
Or for a particular branch:
pipx install 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git@issues/123-abc'
If you don't have <code>pipx</code>, you would first need to:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may in turn need you to log out and back in.
* For Toil options, you will want '''--batchSystem slurm''' to make it use Slurm and '''--batchLogsDir ./logs''' (or some other location on a shared filesystem) for the Slurm logs to not get lost.
* You may be able to speed up your workflow with '''--caching true''', to cache data on nodes to be shared among multiple simultaneous tasks.
* If using '''toil-wdl-runner''', you might want to add '''--jobStore ./jobStore''' to make sure the job store is in a defined, shared location so that you can use '''--restart''' later.
* If using '''toil-wdl-runner''', you will want to set the '''SINGULARITY_CACHEDIR''' and '''MINIWDL__SINGULARITY__IMAGE_CACHE''' environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your '''~/.bashrc''' you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
1fff1e93ec5f7c131e8fe0dee624b9f5a445910b
Phoenix WDL Tutorial
0
45
643
509
2025-02-14T19:00:28Z
Anovak
4
/* Installing Toil with WDL support */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
When installing, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
e9069759f0e587f54789417a9c2852c4e9337b13
644
643
2025-02-14T19:03:18Z
Anovak
4
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to log out and log back in or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
After that, **log out and log back in**, to restart bash and pick up the change.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
649c1d3c62a9a3924a66c7e94998f530a4f558b9
645
644
2025-02-14T19:09:08Z
Anovak
4
/* Installing Toil with WDL support */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
9e9dea7fe5823103dae229ea3fe57a82541c98e2
Requirement for users to get GI VPN access
0
9
647
597
2025-02-20T20:47:12Z
Weiler
3
wikitext
text/x-wiki
Before you are allowed access to our firewalled/secure area ("Prism"), you have to complete 3 items and provide the completed certificates or forms:
'''1''': You must take and complete the NIH Public Security Refresher Course online. You must complete the course in a single continuous sitting:
https://irtsectraining.nih.gov/public.aspx
Click on the "Enter Public Training Portal" near the bottom of the page. The course is titled "2024 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness Refresher". At the end you will be able to save the completion certificate that should have your name on it.
'''2''': You need to sign the Genomics Institute VPN User Agreement (digital signature OK), located here for download:
[[Media:GI_VPN_Policy.pdf]]
'''3''': Please read and sign the last page of the NIH Genomic Data Sharing Policy agreement (digital signature OK), located here for download. By signing the document you agree that you have read and understand the policies described therein and that you agree to abide by those policies:
[[Media:NIH_GDS_Policy.pdf]]
When you have the three documents described above ready, please complete this form: https://app.smartsheet.com/b/form/a76dbd90ba0240ab9ea9d39b390586ce.
There are two parts in this process.
1. For the user, please fill in ALL required fields '''and attach''' all three required documents described above. The form then goes to your PI for approval - remind them to approve it, or it won't get sent to us for processing!
2. For the Sponsor/PI - you will receive an email from Smartsheets. Please fill in all required fields and submit.
We will receive your completed request and we will create your account, then you will receive a welcome email with instructions on how to configure your VPN client and gain access to our systems.
When using the VPN software off-campus, it will usually work unless the wireless network you are on has restrictions preventing it from functioning. Some other universities have such restrictions (notably UCSF), but most other wireless network and home wireless networks should work fine.
'''PLEASE NOTE:''' Because of the overhead required in setting up VPN access, please only request access if you have an immediate need to work on data that exists behind the firewall. We have had a decent number of people request access and go through the setup but then never use it. In other words, please do not request access because "one day you might need it", but because you '''do''' actually need it!
'''ALSO NOTE:''' VPN accounts typically expire after one year from the date of first gaining access. To renew for another year you will need your PI/sponsor to send us a note asking for renewal.
c30fda3ee58c3d9fe856694f63f9b359a8e82bf5
Resetting your VPN/PRISM Password
0
60
648
554
2025-02-20T22:45:52Z
Weiler
3
wikitext
text/x-wiki
If you have forgotten you VPN password (which is also your PRISM UNIX password), send an email to '''cluster-admin@soe.ucsc.edu''' requesting that your password be reset (include your username in the request).
Once we have sent you your new temporary password, you will need to:
1: Log into the PRISM VPN using this new temporary password.
2: Log into one of the servers behind the firewall (mustard, emerald, crimson or razzmatazz) using you new temporary password.
3: Once you log in there, it should ask you to type in your temporary password one more time, then it will ask you to choose a new password. If it does not ask you to change your password (because you are logging in with SSH public keys), use the '''passwd''' command to change your password. Once you choose a new password (and type it twice for confirmation), log out of your SSH session. '''NOTE:''' Your new password must be 10 characters long, using three or more character classes (lowercase, uppercase, number or special character).
4: Log out (disconnect) from the VPN. '''This step is very important!'''
5: Log back into the VPN using your '''new''' password that you chose in step 3.
6: Log back into one of the servers (mustard, emerald, crimson or razzmatazz) using your new password.
Assuming all that works, your password has been reset. You cannot reset your password to one of the prior five passwords you have used for your account.
1cadb808490eaf5ff257f1cab142eabaed9fbd0b
Duo Pushes Aren't Being Sent to My Phone!
0
73
654
2025-03-15T15:31:17Z
Weiler
3
Created page with "If you are having an issue such that you are trying to login to the Genomics Institute MFA VPN service and you are typing your username and password correctly, but you aren't receiving a Duo Push on your phone (and then the login times out), follow these steps to troubleshoot it. First, did you enroll your phone in Duo MFA when you set up your CruzID? If not, follow these instructions to get started: https://its.ucsc.edu/mfa/enroll.html If you have already done that..."
wikitext
text/x-wiki
If you are having an issue such that you are trying to login to the Genomics Institute MFA VPN service and you are typing your username and password correctly, but you aren't receiving a Duo Push on your phone (and then the login times out), follow these steps to troubleshoot it.
First, did you enroll your phone in Duo MFA when you set up your CruzID? If not, follow these instructions to get started:
https://its.ucsc.edu/mfa/enroll.html
If you have already done that and you have successfully received Duo Pushes in the past, then follow these steps to debug it:
# If you are on Wifi, try disabling Wifi and just use your phone’s cellular connection. Then try logging in again.
# Make sure that notifications are enabled on the Duo App. Sometimes they weirdly “disable”, and the pushes don’t come in.
# Make sure your phone isn’t in “Do Not Disturb” or “Focus” mode. Sometimes folks have Focus/Do Not Disturb turn on at a certain time of night, which can cause Duo to stop working. If that was the case with you, it may work during the daytime but not in the evening.
# Reboot your phone! You never know.
# Double check that the time and date are correct on your phone. If they aren’t, Duo stops working.
# We’re getting to the bottom of the barrel here. Try “pulling down” on the Duo App screen to see if it refreshes any pending notifications.
# If none of that works, then your may have to re-initialize Duo altogether on your phone, which we can help with.
e33bd71d12f3e84296ad0d67b3635daae34b3bb3
655
654
2025-03-15T15:31:35Z
Weiler
3
wikitext
text/x-wiki
If you are having an issue such that you are trying to login to the Genomics Institute MFA VPN service and you are typing your username and password correctly, but you aren't receiving a Duo Push on your phone (and then the login times out), follow these steps to troubleshoot it.
First, did you enroll your phone in Duo MFA when you set up your CruzID? If not, follow these instructions to get started:
https://its.ucsc.edu/mfa/enroll.html
If you have already done that and you have successfully received Duo Pushes in the past, then follow these steps to debug it:
# If you are on Wifi, try disabling Wifi and just use your phone’s cellular connection. Then try logging in again.
# Make sure that notifications are enabled on the Duo App. Sometimes they weirdly “disable”, and the pushes don’t come in.
# Make sure your phone isn’t in “Do Not Disturb” or “Focus” mode. Sometimes folks have Focus/Do Not Disturb turn on at a certain time of night, which can cause Duo to stop working. If that was the case with you, it may work during the daytime but not in the evening.
# Reboot your phone! You never know.
# Double check that the time and date are correct on your phone. If they aren’t, Duo stops working.
# We’re getting to the bottom of the barrel here. Try “pulling down” on the Duo App screen to see if it refreshes any pending notifications.
# If none of that works, then your may have to re-initialize Duo altogether on your phone, which we can help with.
e609fd599b2f20067ee83b4413b248f20b6288d0
656
655
2025-03-15T15:31:50Z
Weiler
3
wikitext
text/x-wiki
If you are having an issue such that you are trying to login to the Genomics Institute MFA VPN service and you are typing your username and password correctly, but you aren't receiving a Duo Push on your phone (and then the login times out), follow these steps to troubleshoot it.
First, did you enroll your phone in Duo MFA when you set up your CruzID? If not, follow these instructions to get started:
https://its.ucsc.edu/mfa/enroll.html
If you have already done that and you have successfully received Duo Pushes in the past, then follow these steps to debug it:
# If you are on Wifi, try disabling Wifi and just use your phone’s cellular connection. Then try logging in again.
# Make sure that notifications are enabled on the Duo App. Sometimes they weirdly “disable”, and the pushes don’t come in.
# Make sure your phone isn’t in “Do Not Disturb” or “Focus” mode. Sometimes folks have Focus/Do Not Disturb turn on at a certain time of night, which can cause Duo to stop working. If that was the case with you, it may work during the daytime but not in the evening.
# Reboot your phone! You never know.
# Double check that the time and date are correct on your phone. If they aren’t, Duo stops working.
# We’re getting to the bottom of the barrel here. Try “pulling down” on the Duo App screen to see if it refreshes any pending notifications.
# If none of that works, then your may have to re-initialize Duo altogether on your phone, which we can help with.
e33bd71d12f3e84296ad0d67b3635daae34b3bb3
Converting From Non-MFA VPN to the MFA-Enabled VPN on Windows
0
72
657
637
2025-03-17T14:17:49Z
Anovak
4
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.. Launch the '''OpenVPN GUI''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for your old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before. If you need to use an authentication method other than Duo Push, you can append a comma, and then the name of the method (like "push", "sms", or "phone"), or a numeric second factor code, to you password when you submit it.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
868a85f48448452577598342ab85c132f84f9c4f
661
657
2025-03-17T14:19:14Z
Anovak
4
wikitext
text/x-wiki
If you are using OpenVPN Connect on Windows 10 or 11 to connect to the GI VPN, and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file by right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember.. Launch the '''OpenVPN GUI''' app (usually there is an icon for it on your Desktop, but you can search for it if not). It will launch and appear in your system tray on the bottom right (the system tray icon kind of looks like a '''^''' icon). You should see the OpenVPN icon there, it looks like a little computer screen with a lock on it. Right click on the OpenVPN icon in the system tray, and you should see a small menu appear. Select "Import file". In the resulting window, browse to your Desktop or wherever you saved the '''prism-duo.ovpn''' file. Select that file an click "Open".
Once you import the file, you should be able to right click on OpenVPN Connect again in the system tray and select the profile you want to connect to. It should show multiple profiles, one for your old profile and one for your new profile. Select the new one, then select "Connect".
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before. If you need to use an authentication method other than Duo Push, you can append a comma, and then the name of the method (like "push", "sms", or "phone"), or a second factor code, to you password when you submit it.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
9cac9c2ee58f23c50aa7c7b47810f8eaa70081a7
Converting From Non-MFA VPN to the MFA-Enabled VPN on MacOS
0
70
658
652
2025-03-17T14:18:16Z
Anovak
4
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Go to the link above and right-click on '''prism-duo.ovpn''' and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before. If you need to use an authentication method other than Duo Push, you can append a comma, and then the name of the method (like "push", "sms", or "phone"), or a numeric second factor code, to you password when you submit it.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
6520f984059fa6952cf777d133675225cc94ae51
660
658
2025-03-17T14:18:57Z
Anovak
4
wikitext
text/x-wiki
If you are using Tunnelblick on MacOS and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Go to the link above and right-click on '''prism-duo.ovpn''' and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. Then open Tunnelblick and click on the Tunnelblick icon on the top right of your screen next to the date. It kind of looks like a small tunnel. In the window that opens, select "VPN Details...".
In the resulting window, select the "Configurations" tab on the top. You will see a list of Configurations on the left, and it should include the current configuration you use to connect. It may be called 'prism' or maybe 'client'.
Drag the new configuration called '''prism-duo.ovpn''' (from your Desktop) into the Configurations area beneath your old configuration. It should import the configuration. It will ask you if you want to install it for "Only You" or "All Users". Click "Only You". You will also be asked to type in your laptop password.
That's it! Select the new configuration on the left and click the "Connect" button on the bottom right. It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before. If you need to use an authentication method other than Duo Push, you can append a comma, and then the name of the method (like "push", "sms", or "phone"), or a second factor code, to you password when you submit it.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from Tunnelblick by clicking on the old profile in the "Configurations" window, then click on the '''"-"''' button below to remove the old configuration.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
f03203ca27d92a8f4af88ef93250b4d4911b5eb8
Converting From Non-MFA VPN to the MFA-Enabled VPN on Linux
0
71
659
638
2025-03-17T14:18:44Z
Anovak
4
wikitext
text/x-wiki
If you are using OpenVPN on Linux to connect to the GI VPN and you are looking to convert to the new MFA-enabled GI VPN, you have come to the right place. You must already have Duo set up with your CruzID (which most of you do). If for some reason you don't have Duo set up yet on your phone, go here to enroll a device and configure Push Notifications with Duo before continuing:
https://its.ucsc.edu/mfa/enroll.html
OK! Let's get to it.
Disconnect from the VPN if you are already connected. All the various flavors and versions of Linux vary in the specifics, so you may not be following these exact instructions to get it to work. This is based on the Network Manager in Ubuntu, but most Ubuntu/Debian variants will be similar.
Then you will need to download the new OpenVPN config file from here:
https://giwiki.gi.ucsc.edu/downloads/prism-duo.ovpn
The credentials to access that website are username: '''genecats''' and password: '''KiloKluster'''
Download that file right-clicking on the link above and selecting "Save Link As...", and save it to your Desktop or some other area you will remember. or some other easy to remember location.
We will be installing the Prism VPN profile via the Network Manager GUI interface.
Open '''Network Manager''' from '''Gnome Settings''' option and select the '''Network''' tab and click on the '''VPN +''' symbol:
[[File:Configuring_1.png|600px]]
From the '''Add VPN''' window, click on the '''Import from file...''' option:
[[File:Configuring_2.png|600px]]
You must navigate to your .ovpn file (/path/to/your/prism-duo.ovpn) and click on '''Open''' button:
[[File:Configuring_3.png|600px]]
Click on the '''Add''' button:
[[File:Configuring_4.png|600px]]
Finally, click the '''On/Off''' button to start on the new VPN:
[[File:Configuring_5.png|600px]]
That's it! It will ask you for your usual GI PRISM username and password that you usually use to connect to our VPN, and after that it will send a Duo Push notification to your phone, and then you should be logged in. Other than the Duo Push, the VPN behaves exactly like it did before. If you need to use an authentication method other than Duo Push, you can append a comma, and then the name of the method (like "push", "sms", or "phone"), or a second factor code, to you password when you submit it.
If you have issues you can always revert back to the old configuration, which will still work for a while. We will disable the old VPN soon though, so make every effort to get the new VPN setup working.
Once you have the new VPN working, feel free to delete the old profile from the Network Manager.
As always, please email '''cluster-admin@soe.ucsc.edu''' if you need help or have any questions.
b78ad3ddb1b152cd49c9ab0b861b117021d800aa
Phoenix WDL Tutorial
0
45
662
645
2025-03-18T18:42:50Z
Anovak
4
/* Frequently Asked Questions */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the **end** of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing that part as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space fro files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b76c83ce48777e8417d1e89f6d4df58e4a5951d6
663
662
2025-03-18T18:43:03Z
Anovak
4
/* How do I delete files in WDL? */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the 'end' of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing that part as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space fro files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
ad50c691af2e79d9290649a23955888818d42268
664
663
2025-03-18T18:43:13Z
Anovak
4
/* How do I delete files in WDL? */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the ''end'' of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing that part as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space fro files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
8ee3020ce5c7efef0f7b283fc471e4d51307be37
665
664
2025-03-18T18:43:57Z
Anovak
4
/* How do I delete files in WDL? */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the ''end'' of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing the part that creates and uses it as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space for files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
9d70b51c4d35c9688e28b394a4c1dc5e875a63c8
666
665
2025-03-18T20:46:04Z
Anovak
4
/* Writing your own workflow */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:22.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --slurmTime 00:10:00 --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the ''end'' of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing the part that creates and uses it as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space for files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
b361486063943417639be0c93e439016ddacf435
667
666
2025-03-18T20:48:24Z
Anovak
4
/* Writing the file */
wikitext
text/x-wiki
'''Tutorial: Getting Started with WDL Workflows on Phoenix'''
Instead of giant shell scripts that only work on one grad student's laptop, modern, reusable bioinformatics experiments should be written as workflows, in a language like Workflow Description language (WDL). Workflows succinctly describe their own execution requirements, and which pieces depend on which other pieces, making your analyses reproducible by people other than you.
Workflows are also easily scaled up and down: you can develop and test your workflow on a small test data set on one machine, and then run it on real data on the cluster without having to worry about whether the right tasks will run in the right order.
This tutorial will help you get started writing and running workflows. The '''Phoenix Cluster Setup''' section is specifically for the UC Santa Cruz Genomics Institute's Phoenix Slurm cluster. The other sections are broadly applicable to other environments. By the end, you will be able to run workflows on Slurm with [https://toil.readthedocs.io/en/latest/ Toil], write your own workflows in WDL, and debug workflows when something goes wrong.
=Phoenix Cluster Setup=
Before we begin, you will need a computer to work at, which you are able to install software on, and the ability to connect to other machines over SSH.
==Getting VPN access==
We are going to work on the Phoenix cluster, but this cluster is kept behind the Prism firewall, where all of our controlled-access data lives. So, to get access to the cluster, you need to get access to the VPN (Virtual Private Network) system that we use to allow people through the firewall.
To get VPN access, follow the instructions at https://giwiki.gi.ucsc.edu/index.php/Requirement_for_users_to_get_GI_VPN_access. Note that this process involves making a one-on-one appointment with one of our admins to help you set up your VPN client, so make sure to do it in advance of when you need to use the cluster.
==Connecting to Phoenix==
Once you have VPN access, you can connect to any of the machines with access to the Phoenix cluster. These interactive nodes are fairly large machines that can do some work locally, but you will still want to run larger workflows on the actual cluster. For this tutorial, we will use <code>emerald.prism</code> as our login node.
To connect to the cluster:
1. Connect to the VPN.
2. SSH to <code>emerald.prism</code>. At the command line, run:
ssh emerald.prism
If your username on the cluster (say, <code>flastname</code>) is different than your username on your computer (which might be <code>firstname</code>), you might instead have to run:
ssh flastname@emerald.prism
The first time you connect, you will see a message like:
The authenticity of host 'emerald.prism (10.50.1.67)' can't be established.
ED25519 key fingerprint is SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is your computer asking you to help it decide if it is talking to the genuine <code>emerald.prism</code>, and not an imposter. You will want to make sure that the "key fingerprint" is indeed <code>SHA256:8hJQShO6jhrym9UVyMldKsKOnOFtWRChgjK5cZNhkAI.</code>. If it is not, someone (probably the GI sysadmins, but possibly a cabal of hackers) has replaced the head node, and you should verify that this was supposed to happen. If the fingerprints do match, type <code>yes</code> to accept and remember that the server is who it says it is.
==Installing Toil with WDL support==
Once you are on the head node, you can install Toil, a program for running workflows.
Toil is written in Python, and the modern way to install Python command line tools is with pipx. So [https://pipx.pypa.io/latest/installation/ install pipx]:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
This may instruct you to **log out and log back in** or take some other action to adopt the new <code>PATH</code> settings.
When installing Toil, you need to specify that you want WDL support. To do this, you can run:
pipx install 'toil[wdl]'
If you also want to use AWS S3 <code>s3://</code> and/or Google <code>gs://</code> URLs for data, you will need to also install Toil with the <code>aws</code> and <code>google</code> extras, respectively:
pipx install 'toil[wdl,aws,google]'
To change what extras are used when you have an existing Toil installation, you will need to use the <code>--force</code> option.
This will install Toil in the <code>.local</code> directory inside your home directory, which we write as <code>~/.local</code>. The program to run WDL workflows, <code>toil-wdl-runner</code>, will be at <code>~/.local/bin/toil-wdl-runner</code>. The <code>python3 -m pipx ensurepath</code> command should have added the <code>~/.local</code> directory to your <code>PATH</code> environment variable, to ensure you can find these commands.
If you see something from <code>pipx</code> like:
- cwltoil (symlink missing or pointing to unexpected location)
Then <code>pipx uninstall toil</code>, remove the offending file from <code>~/.local/bin</code>, and try again.
To make sure it worked, you can run:
toil-wdl-runner --help
If everything worked correctly, it will print a long list of the various option flags that the <code>toil-wdl-runner</code> command supports.
If you ever want to upgrade Toil to a new release, you can repeat the <code>pip</code> command above.
==Configuring your Phoenix Environment==
'''Do not try and store data in your home directory on Phoenix!''' The home directories are meant for code and programs. Any data worth running a workflow on should be in a directory under <code>/private/groups</code>. You will probably need to email the admins to get added to a group so you can create a directory to work in somewhere under <code>/private/groups</code>. Usually you would end up with <code>/private/groups/YOURGROUPNAME/YOURUSERNAME</code>.
Remember this path; we will need it later.
==Configuring Toil for Phoenix==
Toil is set up to work in a large number of different environments, and doesn't necessarily rely on the existence of things like a shared cluster filesystem. However, on the Phoenix cluster, we have a shared filesystem, and so we should configure Toil to use it for caching the Docker container images used for running workflow steps. However, since these files can be large, and the home directory quota is only 30 GB, we might not be able to keep these in your home directory.
We would like to be able to store these on the cluster's large storage array, under <code>/private/groups</code>. However, Toil needs to use file locks in these directories to prevent simultaneous Singularity calls from producing internal Singularity errors, and Ceph currently has [https://tracker.ceph.com/issues/65607 a bug where these file locking operations can freeze the Ceph servers].
If you have '''a small number of container images''' that will fit in your home directory, you can keep them there. [https://github.com/DataBiosphere/toil/commit/cb0b291bb7f6212bfe69221dd9f09d72f83e92fb Since Toil 6.1.0], this is the default behavior and you don't need to do anything. (Unless you previously set <code>SINGULARITY_CACHEDIR</code> or <code>MINIWDL__SINGULARITY__IMAGE_CACHE</code>, in which case you need to unset them.)
'''If you don't have room in your home directory''' for container images, currently the recommended approach is to use node-local storage under <code>/data/tmp</code>. This results in each node pulling each container image, but images will be saved across workflows.
You can set that up for all your workflows with:
echo 'export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity"' >>~/.bashrc
echo 'export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"' >>~/.bashrc
Then '''log out and log back in again''', to apply the changes.
=Running an existing workflow=
First, let's use <code>toil-wdl-runner</code> to run an existing demonstration workflow. We're going to use the MiniWDL self-test workflow, from the [https://github.com/chanzuckerberg/miniwdl#readme MiniWDL project].
First, go to your user directory under <code>/private/groups</code>, and make a directory to work in.
cd /private/groups/YOURGROUPNAME/YOURUSERNAME
mkdir workflow-test
cd workflow-test
Next, download the workflow. While Toil can run workflows directly from a URL, your commands will be shorter if the workflow is available locally.
wget https://raw.githubusercontent.com/DataBiosphere/toil/d686daca091849e681d2f3f3a349001ca83d2e3e/src/toil/test/wdl/miniwdl_self_test/self_test.wdl
==Preparing an input file==
Near the top of the WDL file, there's a section like this:
workflow hello_caller {
input {
File who
}
This means that there is a workflow named <code>hello_caller</code> in this file, and it takes as input a file variable named <code>who</code>. For this particular workflow, the file is supposed to have a list of names, one per line, and the workflow is going to greet each one.
So first, we have to make that list of names. Let's make it in <code>names.txt</code>
echo "Mridula Resurrección" >names.txt
echo "Gershom Šarlota" >>names.txt
echo "Ritchie Ravi" >>names.txt
Then, we need to create an ''inputs file'', which is a JSON (JavaScript Object Notation) file describing what value to use for each input when running the workflow. (You can also reach down into the workflow and override individual task settings, but for now we'll just set the inputs.) So, make another file next to <code>names.txt</code> that references it by relative path, like this:
echo '{"hello_caller.who": "./names.txt"}' >inputs.json
Note that, for a key, we're using the workflow name, a dot, and then the input name. For a value, we're using a quoted string of the filename, relative to the location of the inputs file. Absolute paths and URLs will also work for files; more information on the input file syntax is in [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#json-input-format the JSON Input Format section of the WDL specification].
==Testing at small scale single-machine==
We are now ready to run the workflow!
You don't want to run workflows on the head node. So, use Slurm to get an interactive session on one of the cluster's worker nodes, by running:
srun -c 2 --mem 8G --time=02:00:00 --partition=medium --pty bash -i
This will start a new shell that can run for 2 hours; to leave it and go back to the head node you can use <code>exit</code>.
In your new shell, run this Toil command:
toil-wdl-runner self_test.wdl inputs.json -o local_run
This will, by default, use the <code>single_machine</code> Toil "batch system" to run all of the workflow's tasks locally. Output will be sent to a new directory named <code>local_run</code>.
This will print a lot of logging to standard error, and to standard output it will print:
{"hello_caller.message_files": ["local_run/Mridula Resurrecci\u00f3n.txt", "local_run/Gershom \u0160arlota.txt", "local_run/Ritchie Ravi.txt"], "hello_caller.messages": ["Hello, Mridula Resurrecci\u00f3n!", "Hello, Gershom \u0160arlota!", "Hello, Ritchie Ravi!"]}
The <code>local_run</code> directory will contain the described text files (with Unicode escape sequences like <code>\u00f3</code> replaced by their corresponding characters), each containing a greeting for the corresponding person.
To leave your interactive Slurm session and return to the head node, use <code>exit</code>.
==Running at larger scale==
Back on the head node, let's prepare to run a larger run. Greeting 3 people isn't cool, let's greet one hundred people!
Go get this handy list of people and cut it to length:
wget https://gist.githubusercontent.com/smsohan/ae142977b5099dba03f6e0d909108e97/raw/f6e319b1a0f6a0f87f93f73b3acd24795361aeba/1000_names.txt
head -n100 1000_names.txt >100_names.txt
And make a new inputs file:
echo '{"hello_caller.who": "./100_names.txt"}' >inputs_big.json
Now, we will run the same workflow, but with the new inputs, and against the Slurm cluster.
To run against the Slurm cluster, we need to use the <code>--jobStore</code> option to point Toil to a shared directory it can create where it can store information that the cluster nodes can read. Until Toil [https://github.com/DataBiosphere/toil/issues/4775 gets support for data file caching on Slurm], we will also need the <code>--caching false</code> option. We will add the <code>--batchLogsDir</code> option to tell Toil to store the logs from the individual Slurm jobs in a folder on the shared filesystem. We'll also use the <code>-m</code> option to save the output JSON to a file instead of printing it.
Additionally, since [https://github.com/DataBiosphere/toil/issues/4686 Toil can't manage Slurm partitions itself], we will use the <code>TOIL_SLURM_ARGS</code> environment variable to tell Toil how long jobs should be allowed to run for (2 hours) and what [[Slurm Queues (Partitions) and Resource Management | partition]] they should go in.
mkdir -p logs
export TOIL_SLURM_ARGS="--time=02:00:00 --partition=medium"
toil-wdl-runner --jobStore ./big_store --batchSystem slurm --caching false --batchLogsDir ./logs self_test.wdl inputs_big.json -o slurm_run -m slurm_run.json
This will tick for a while, but eventually you should end up with 100 greeting files in the <code>slurm_run</code> directory.
=Writing your own workflow=
In addition to running existing workflows, you probably want to be able to write your own. This part of the tutorial will walk you through writing a workflow. We're going to write a workflow for [https://en.wikipedia.org/wiki/Fizz_buzz Fizz Buzz].
==Writing the file==
===Version===
All WDL files need to start with a <code>version</code> statement (unless they are very old <code>draft-2</code> files). Toil supports <code>draft-2</code>, WDL 1.0, and WDL 1.1, while Cromwell (another popular WDL runner used on Terra) supports only <code>draft-2</code> and 1.0.
So let's start a new WDL 1.0 workflow. Open up a file named <code>fizzbuzz.wdl</code> and start with a version statement:
version 1.0
===Workflow Block===
Then, add an empty <code>workflow</code> named <code>FizzBuzz</code>.
version 1.0
workflow FizzBuzz {
}
===Input Block===
Workflows usually need some kind of user input, so let's give our workflow an <code>input</code> section.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
}
Notice that each input has a type, a name, and an optional default value. If the type ends in <code>?</code>, the value is optional, and it may be <code>null</code>. If an input is ''not'' optional, and there is no default value, then the user's inputs file ''must'' specify a value for it in order for the workflow to run.
===Body===
Now we'll start on the body of the workflow, to be inserted just after the inputs section.
The first thing we're going to need to do is create an array of all the numbers up to the <code>item_count</code>. We can do this by calling the WDL <code>range()</code> function, and assigning the result to an <code>Array[Int]</code> variable.
Array[Int] numbers = range(item_count)
WDL 1.0 has [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#standard-library a wide variety of functions in its standard library], and WDL 1.1 has even more.
===Scattering===
Once we create an array of all the numbers, we can use a <code>scatter</code> to operate on each. WDL does not have loops; instead it has scatters, which work a bit like a <code>map()</code> in Python. The body of the scatter runs for each value in the input array, all in parallel. We're going to increment all the numbers, since FizzBuzz starts at 1 but WDL <code>range()</code> starts at 0.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
}
===Conditionals===
Inside the body of the scatter, we are going to put some conditionals to determine if we should produce <code>"Fizz"</code>, <code>"Buzz"</code>, or <code>"FizzBuzz"</code>. To support our <code>fizzbuzz_override</code>, we use an array of it and a default value, and use the WDL <code>select_first()</code> function to find the first non-null value in that array.
Each execution of a scatter is allowed to declare variables, and outside the scatter those variables are combined into arrays of all the results. But each variable can be declared only ''once'' in the scatter, even with conditionals. So we're going to use <code>select_first()</code> at the end and take advantage of variables from un-executed conditionals being <code>null</code>.
Note that WDL supports conditional ''expressions'' with a <code>then</code> and an <code>else</code>, but conditional ''statements'' only have a body, not an <code>else</code> branch. If you need an else you will have to check the negated condition.
So first, let's handle the special cases.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
}
}
===Calling Tasks===
Now for the normal numbers, we need to convert our number into a string. In WDL 1.1, and in WDL 1.0 on Cromwell, you can use a <code>${}</code> substitution syntax in quoted strings anywhere, not just in command line commands. Toil technically will support this too, but it's not in the spec, and the tutorial needs an excuse for you to call a task. So we're going to insert a call to a <code>stringify_number</code> task, to be written later.
To call a task (or another workflow), we use a <code>call</code> statement and give it some inputs. Then we can fish the output values out of the task with </code>.</code> access, only if we don't make a noise instead.
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string])
}
We can put the code into the workflow now, and set about writing the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
===Writing Tasks===
Our task should go after the workflow in the file. It looks a lot like a workflow except it uses <code>task</code>.
task stringify_number {
}
We're going to want it to take in an integer <code>the_number</code>, and we're going to want it to output a string <code>the_string</code>. So let's fill that in in <code>input</code> and <code>output</code> sections.
task stringify_number {
input {
Int the_number
}
# ???
output {
String the_string # = ???
}
}
Now, unlike workflows, tasks can have a <code>command</code> section, which gives a command to run. This section is now usually set off with triple angle brackets, and inside it you can use <code>~{}</code>, that is, Bash-like substitution but with a tilde, to place WDL variables into your command script. So let's add a command that will echo back the number so we can see it as a string.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string # = ???
}
}
Now we need to capture the result of the command script. The WDL <code>stdout()</code> returns a WDL <code>File</code> containing the standard output printed by the task's command. We want to read that back into a string, which we can do with the WDL <code>read_string()</code> function (which also [https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#string-read_stringstringfile removes trailing newlines]).
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
}
We're also going to want to add a <code>runtime</code> section to our task, to specify resource requirements. We're also going to tell it to run in a Docker container, to make sure that absolutely nothing can go wrong with our delicate <code>echo</code> command. In a real workflow, you probably want to set up optiopnal inputs for all the tasks to let you control the resource requirements, but here we will just hardcode them.
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:24.04"
}
}
The <code>disks</code> section is a little weird; it isn't in the WDL spec, but Toil supports Cromwell-style strings that ask for a <code>local-disk</code> of a certain number of gigabytes, which may suggest that it be <code>SSD</code> storage.
Then we can put our task into our WDL file:
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:24.04"
}
}
===Output Block===
Now the only thing missing is a workflow-level <code>output</code> section. Technically, in WDL 1.0 you aren't supposed to need this, but you do need it in 1.1 and Toil doesn't actually send your outputs anywhere yet if you don't have one, so we're going to make one. We need to collect together all the strings that came out of the different tasks in our scatter into an <code>Array[String]</code>. We'll add the <code>output</code> section at the end of the <code>workflow</code> section, above the task.
version 1.0
workflow FizzBuzz {
input {
# How many FizzBuzz numbers do we want to make?
Int item_count
# Every multiple of this number, we produce "Fizz"
Int to_fizz = 3
# Every multiple of this number, we produce "Buzz"
Int to_buzz = 5
# Optional replacement for the string to print when a multiple of both
String? fizzbuzz_override
}
Array[Int] numbers = range(item_count)
scatter (i in numbers) {
Int one_based = i + 1
if (one_based % to_fizz == 0) {
String fizz = "Fizz"
if (one_based % to_buzz == 0) {
String fizzbuzz = select_first([fizzbuzz_override, "FizzBuzz"])
}
}
if (one_based % to_buzz == 0) {
String buzz = "Buzz"
}
if (one_based % to_fizz != 0 && one_based % to_buzz != 0) {
# Just a normal number.
call stringify_number {
input:
the_number = one_based
}
}
String result = select_first([fizzbuzz, fizz, buzz, stringify_number.the_string]
}
output {
Array[String] fizzbuzz_results = result
}
}
task stringify_number {
input {
Int the_number
}
command <<<
# This is a Bash script.
# So we should do good Bash script things like stop on errors
set -e
# Now print our number as a string
echo ~{the_number}
>>>
output {
String the_string = read_string(stdout())
}
runtime {
cpu: 1
memory: "0.5 GB"
disks: "local-disk 1 SSD"
docker: "ubuntu:24.04"
}
}
Because the <code>result</code> variable is defined inside a <code>scatter</code>, when we reference it outside the scatter we see it as being an array.
==Running the Workflow==
Now all that remains is to run the workflow! As before, make an inputs file to specify the workflow inputs:
echo '{"FizzBuzz.item_count": 20}' >fizzbuzz.json
Then run it on the cluster with Toil:
toil-wdl-runner --jobStore ./fizzbuzz_store --batchSystem slurm --slurmTime 00:10:00 --caching false --batchLogsDir ./logs fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
Or locally:
toil-wdl-runner fizzbuzz.wdl fizzbuzz.json -o fizzbuzz_out -m fizzbuzz_out.json
=Debugging Workflows=
Sometimes, your workflow won't work. Try these ideas for figuring out what is going wrong.
==Restarting the Workflow==
If you think your workflow failed from a transient problem (such as a Docker image not being available) that you have fixed, and you ran the workflow with <code>--jobStore</code> set manually to a directory that persists between attempts, you can add <code>--restart</code> to your workflow command and make Toil try again. It will pick up from where it left off and rerun any failed tasks and then the rest of the workflow.
This will ''will not'' pick up any changes to your WDL source code files; those are read once at the beginning and not re-read on restart.
If restarting the workflow doesn't help, you may need to move on to more advanced debugging techniques.
==Debugging Options==
When debugging a workflow, make sure to run the workflow with <code>--logDebug</code>, to set the log level to <code>DEBUG</code>, and with <code>--jobStore /some/path/to/a/shared/directory/it/can/create</code> so that the stored files shipped between jobs are in a place you can access them.
When debug logging is on, the log from every Toil job is inserted in the main Toil log between these markers:
=========>
Toil job log is here
<=========
Normally, only the logs of failing jobs and the output of commands run from WDL are reproduced like this.
==Reading the Log==
When a WDL workflow fails, you are likely to see a message like this:
WDL.runtime.error.CommandFailed: task command failed with exit status 1
[2023-07-16T16:23:54-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-15.prism
This means that the command line command specified by one of your WDL tasks exited with a failing (i.e. nonzero) exit code, which will happen when either the command line command is written wrong, or when the error detection code in the tool you are trying to run detects and reports an error.
Go up higher in the log until you find lines that look like:
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stderr follows:
And
[2024-01-16T20:12:19-0500] [Thread-3 (statsAndLoggingAggregator)] [I] [toil.statsAndLogging] hello_caller.0.hello.stdout follows:
These will be followed by the standard error and standard output log data from the task's command. There may be useful information (such as an error message from the underlying tool) in there.
If you would like individual task logs to be saved separately for later reference, you can use the <code>--writeLogs</code> option to specify a directory to store them. For more information, see [https://toil.readthedocs.io/en/latest/wdl/running.html#managing-workflow-logs the Toil documentation of workflow task logs].
==Reproducing Problems==
When trying to fix a failing step, it is useful to be able to run a command outside of Toil or WDL that might reproduce the problem. In addition to getting the standard output and standard error logs as described above, you may also need input files for your tool in order to do this.
=== Automatically Fetching Input Files ===
The <code>toil debug-job</code> command has a <code>--retrieveTaskDirectory</code> option that lets you dump out a directory with all the files that a failing WDL task would use. You can use it like:
toil debug-job ./jobstore WDLTaskJob --retrieveTaskDirectory dumpdir
If there are multiple failing tasks, you might need to replace <code>WDLTaskJob</code> with the name of one of the failing jobs. See [https://toil.readthedocs.io/en/latest/running/debugging.html#fetching-job-inputs the Toil documentation on retrieving files] for more on how to use this command.
=== Manually Finding Input Files ===
If you can't use <code>toil debug-job</code>, you might need to manually dig through the job store for files. In the log of your failing Toil task, look for lines like this:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-4f886176ab8344baaf17dc72fc445445/toplog.sh' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmprwhi6h3q/toplog.sh'
[2023-07-16T16:23:54-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam' to path '/data/tmp/c3d51c0611b9511da167528976fef714/9b0e/467f/tmpjyksfoko/Sample.bam'
...
The <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam</code> part is a Toil file ID, and it is a relative path from your <code>--jobStore</code> value to where the file is stored on disk. So if you ran the workflow with <code>--jobStore /private/groups/patenlab/anovak/jobstore</code>, you would look for this file at:
/private/groups/patenlab/anovak/jobstore/files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-1bb5d92ae8f3413eb82fe8ef88686bf6/Sample.bam
==More Ways of Finding Files==
Sometimes, a step might not fail, but you still might want to see the files it is using as input. If you have the job store path, you can use the <code>find</code> command to try and find the files by name. For example, if you want to look at <code>Sample.bam</code>, you can look for it like this:
find /path/to/the/jobstore -name "Sample.bam"
If you want to find files that were ''uploaded'' from a job, look for lines like this in the job's log:
[2023-07-16T15:58:39-0700] [MainThread] [D] [toil.wdl.wdltoil] Virtualized /data/tmp/2846b6012e3e5535add03b363950dd78/cb23/197c/work/bamPerChrs/Sample.chr14.bam as WDL file toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam
You can take the <code>toilfile:2703483274%3A0%3Afiles%2Ffor-job%2Fkind-WDLTaskJob%2Finstance-b4c5x6hq%2Ffile-c4e4f1b16ddf4c2ab92c2868421f3351%2FSample.chr14.bam/Sample.chr14.bam</code> URI and URL-decode it with, for example, [https://www.urldecoder.io/], getting this:
toilfile:2703483274:0:files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam
Then you can take the part after the last colon, <code>files/for-job/kind-WDLTaskJob/instance-b4c5x6hq/file-c4e4f1b16ddf4c2ab92c2868421f3351/Sample.chr14.bam/Sample.chr14.bam</code>, and that is the path relative to the job store where this file can be found.
==Using Development Versions of Toil==
Sometimes, bugs will be fixed in the development version of Toil, but not released yet. To try the current development version of Toil, you can install it like this:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl,aws,google]'
If you want to use a particular branch or commit, like <code>aaa451b320fc115b3563ced25cb501301cf86f90</code>, you can do:
pip install --upgrade --user 'git+https://github.com/DataBiosphere/toil.git@aaa451b320fc115b3563ced25cb501301cf86f90#egg=toil[wdl,aws,google]'
==Frequently Asked Questions==
===I am getting warnings about <code>XDG_RUNTIME_DIR</code>===
You may be seeing warnings that <code>XDG_RUNTIME_DIR is set to nonexistent directory /run/user/$UID; your environment may be out of spec!</code>
You should upgrade Toil. [https://github.com/DataBiosphere/toil/commit/ff6bf60ab798a675c20156c749817c4313644b96 Since Toil 6.1.0], Toil no longer issues this warning, and just puts up with bad <code>XDG_RUNTIME_DIR</code> settings.
===Toil said it was <code>Redirecting logging</code> somewhere, but I can't find that file!===
The Toil worker process for each job will say that it is <code>Redirecting logging to /data/tmp/somewhere/worker_log.txt</code>, and when running in single machine mode these messages go to the main Toil log.
The Toil worker logs are automatically cleaned up when the worker finishes. If you want to see the individual worker logs in the Toil log, use the <code>--logDebug</code> option to Toil.
If you are looking for the log for a worker process that did not finish (i.e. that crashed), make sure to look on the machine that the worker actually ran on, not on the head node.
===How do I delete files in WDL?===
WDL doesn't have a built-in way to delete files; if you run a task that deletes a file, it will still exist in Toil's job store storage.
Toil [https://github.com/DataBiosphere/toil/commit/2de6eea2cc2e688b53062a98687445f0cca56669 recently gained support] for deleting files at the ''end'' of WDL workflows. So if you have a large file that you only need for part of your workflow, consider writing the part that creates and uses it as a separate sub-<code>workflow</code> and invoking it with <code>call</code>. Then the file will be cleaned up when the child workflow ends, leaving more space for files created in the parent workflow.
=Additional WDL resources=
For more information on writing and running WDL workflows, see:
* [https://docs.openwdl.org/en/stable/ The WDL dcoumentation]
* [https://www.youtube.com/playlist?list=PL4Q4HssKcxYv5syJKUKRrD8Fbd-_CnxTM The "Learn WDL" video course on YouTube]
* [https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md The WDL 1.1 language specification]
edc715c14556088509e726beac6ac93a106e7a77